List of unique values in multiple columns

JamesCrandall · ‎02-08-2021

I need create a list of unique values from multiple date columns ("startSurvey" and "EndSurvey")...(I guess a list of lists is fine as long as I can iterate over it and grab the 2 unique values). In the end, I am trying to build sql for a separate SearchCursor.

The desired result of the example needs to be:

[[StartSurvey: 6/5/2020 12:21:19 AM, EndSurvey: 6/5/2020 12:35:32 AM, [StartSurvey: 6/11/2020 12:42:05 AM, EndSurvey: 6/11/2020 5:13:18 AM]]

It doesn't necessarily have to be list of lists like this, but in some way that I can easily iterate over it to grab the unique StartDate/EndDate valules.

Anonymous User · ‎02-08-2021

This should give you a unique list of tuples.

date_info_list = list(set([(row[0], row[1]) for row in arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"])]))

View solution in original post

Anonymous User · ‎02-08-2021

Hi @JamesCrandall ,

It is indeed possible to structure your data as a list of lists. Personally though, my preference would be to use list of dictionaries instead. The main difference being that elsewhere in your script when we go to access the data we've stored in the variable, we can refer to the start date and end date using a name that describe the information.

If we go with a list of lists, we need to keep track of which order we originally put these values into the sub-list. On small scale applications it's not a big deal keeping track of what order you put the data in there, but as scripts grow and change hands over the years - little things like this make scripts easier to understand and maintain.

I've outlined the code for both approaches below -

import arcpy
fc = "C:/path/to/data.gdb/FeatureClass"
	
# Using a list of lists approach 
with arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror
    date_info_list = [[row[0], row[1]] for row in scur]

for date_list in date_info_list:
    print("Start: {}, End: {}".format(date_list[0], date_list[1]))
	

# Using a list of dictionaries approach 
with arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror 
    date_info_list = [{"start": row[0], "end": row[1]} for row in scur]

for date_dict in date_info_list:
    print("Start: {}, End: {}".date_dict["start"], date_dict["end"])

Hope that helps!

JamesCrandall · ‎02-08-2021

I'm not getting a unique list of StartSurvey and EndSurvey values. Can you point out where this is supposed to happen in your example.

Thanks!

Anonymous User · ‎02-08-2021

Ah my apologies, I missed the part where we're looking to de-duplicate the data. There's a number of different ways to skin that cat, but here's one that'll capture one entry per unique GUID for you by using the GUID as the key to a dictionary.

import arcpy
fc = "C:/path/to/data.gdb/FeatureClass"

unique_entries = dict()
with arcpy.da.SearchCursor(fc, ["GUID", "StartSuvey", "EndSurvey"]) as scur:
    unique_entries[row[0]] = {"start": row[1], "end": row[2]}

with arcpy.da.SearchCursor(fc, ["GUID"]) as scur:	
	unique_GUIDs = list(set([row[0] for row in scur]))

for GUID in unique_GUIDs:
    print("Start: {}, End: {}".format(unique_entries[GUID]["start"], unique_entries[GUID]["end"]))

Anonymous User · ‎02-08-2021

Hi @JamesCrandall,

Speaking from the experience of having to do pretty much the same thing before, repeatedly, all I can say is that the a List of Dictionaries as suggested above by @Anonymous User is the way you want to go.
It will make incorporating your code much easier.

The only thing I would add would be to use a unique ID to help pull out the exact record that you want:

# Using a list of dictionaries approach 
with arcpy.da.SearchCursor(fc, ["JoinID", "StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror 
    date_info_list = [{"ID": row[0], "start": row[1], "end": row[2]}} for row in scur]

(Use which ever field would be most appropriate instead of Join ID, just used that as it was the only unique ID I could see from your table)

Best of luck!

JamesCrandall · ‎02-08-2021

Maybe I'm missing the obvious, but when I run and print the suggested solution I don't see unique dates. That is, I get this:

[
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  },
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  },
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  }
 ]

But I need this:

[
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },  
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  }
 ]

Anonymous User · ‎02-08-2021

ahh, ok @JamesCrandall my bad I was leading you down the wrong path.

if you just want a list of unique dates, then use @Anonymous User s original list of dictionaries method and don't worry about the unique ID.

Then convert the list to a set, which will show only unique entries in a list.

# Using a list of dictionaries approach 
with arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror 
    date_info_list = [{"start": row[0], "end": row[1]} for row in scur]
date_info_set = set(date_info_list)

JamesCrandall · ‎02-08-2021

Throws an error:

date_info_set = set(date_info_list)
TypeError: unhashable type: 'dict'

Anonymous User · ‎02-08-2021

dicts aren't hashable, I am not sure that set() would work.

DanPatterson · ‎02-08-2021

By example... a small table with 10K records pulling out the unique combinations from 3 fields

import arcpy
from arcpy.da import TableToNumPyArray, NumPyArrayToTable
f = r"C:\arcpro_npg\npg\Project_npg\npgeom.gdb\sample_10k"
arr = TableToNumPyArray(f, "*")

flds = arr.dtype.names  # ---- to get all the fields
sub = arr[['County', 'Town_class', 'Facility']]  # --- subsample from 3 fields
sub.shape
(9996,)

uni, idx, cnts = np.unique(sub, True, return_counts=True)
uni.shape
(24,)

uni
array([('A', 'A_', 'Hall'), ('A', 'A_', 'Hosp'), ('A', 'B_', 'Hall'),
       ('A', 'B_', 'Hosp'), ('A', 'C_', 'Hall'), ('A', 'C_', 'Hosp'),
       ('B', 'A_', 'Hall'), ('B', 'A_', 'Hosp'), ('B', 'B_', 'Hall'),
       ('B', 'B_', 'Hosp'), ('B', 'C_', 'Hall'), ('B', 'C_', 'Hosp'),
       ('C', 'A_', 'Hall'), ('C', 'A_', 'Hosp'), ('C', 'B_', 'Hall'),
       ('C', 'B_', 'Hosp'), ('C', 'C_', 'Hall'), ('C', 'C_', 'Hosp'),
       ('D', 'A_', 'Hall'), ('D', 'A_', 'Hosp'), ('D', 'B_', 'Hall'),
       ('D', 'B_', 'Hosp'), ('D', 'C_', 'Hall'), ('D', 'C_', 'Hosp')],
      dtype={'names':['County','Town_class','Facility'], 'formats':['<U4','<U12','<U16'], 'offsets':[8,24,72], 'itemsize':2184})

# ----
# NumPyArrayToTable  =====>  and back to Arc-world

... sort of retired...