List of unique values in multiple columns

1598
13
Jump to solution
02-08-2021 05:07 PM
JamesCrandall
MVP Frequent Contributor

I need create a list of unique values from multiple date columns ("startSurvey" and "EndSurvey")...(I guess a list of lists is fine as long as I can iterate over it and grab the 2 unique values). In the end, I am trying to build sql for a separate SearchCursor.

2021-02-08_19-55-55.jpg

The desired result of the example needs to be:

[[StartSurvey: 6/5/2020 12:21:19 AM, EndSurvey: 6/5/2020 12:35:32 AM, [StartSurvey: 6/11/2020 12:42:05 AM, EndSurvey: 6/11/2020 5:13:18 AM]]

 

It doesn't necessarily have to be list of lists like this, but in some way that I can easily iterate over it to grab the unique StartDate/EndDate valules.

 

 

0 Kudos
1 Solution

Accepted Solutions
JeffK
by MVP Regular Contributor
MVP Regular Contributor

This should give you a unique list of tuples.

date_info_list = list(set([(row[0], row[1]) for row in arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"])]))

 

View solution in original post

13 Replies
JamesBrander
Esri Contributor

Hi @JamesCrandall ,

It is indeed possible to structure your data as a list of lists. Personally though, my preference would be to use list of dictionaries instead. The main difference being  that elsewhere in your script when we go to access the data we've stored in the variable, we can refer to the start date and end date using a name that describe the information.

If we go with a list of lists, we need to keep track of which order we originally put these values into the sub-list. On small scale applications it's not a big deal keeping track of what order you put the data in there, but as scripts grow and change hands over the years - little things like this make scripts easier to understand and maintain.

I've outlined the code for both approaches below - 

 

 

import arcpy
fc = "C:/path/to/data.gdb/FeatureClass"
	
# Using a list of lists approach 
with arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror
    date_info_list = [[row[0], row[1]] for row in scur]

for date_list in date_info_list:
    print("Start: {}, End: {}".format(date_list[0], date_list[1]))
	

# Using a list of dictionaries approach 
with arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror 
    date_info_list = [{"start": row[0], "end": row[1]} for row in scur]

for date_dict in date_info_list:
    print("Start: {}, End: {}".date_dict["start"], date_dict["end"])

 

Hope that helps!

 

JamesCrandall
MVP Frequent Contributor

I'm not getting a unique list of StartSurvey and EndSurvey values.  Can you point out where this is supposed to happen in your example.

 

Thanks!

0 Kudos
JamesBrander
Esri Contributor

Ah my apologies, I missed the part where we're looking to de-duplicate the data. There's a number of different ways to skin that cat, but here's one that'll capture one entry per unique GUID for you by using the GUID as the key to a dictionary. 

 

 

 

import arcpy
fc = "C:/path/to/data.gdb/FeatureClass"

unique_entries = dict()
with arcpy.da.SearchCursor(fc, ["GUID", "StartSuvey", "EndSurvey"]) as scur:
    unique_entries[row[0]] = {"start": row[1], "end": row[2]}

with arcpy.da.SearchCursor(fc, ["GUID"]) as scur:	
	unique_GUIDs = list(set([row[0] for row in scur]))

for GUID in unique_GUIDs:
    print("Start: {}, End: {}".format(unique_entries[GUID]["start"], unique_entries[GUID]["end"])) 	
	

 

 

 

 

 

MichaelBoyce
Esri Contributor

Hi @JamesCrandall,

Speaking from the experience of having to do pretty much the same thing before, repeatedly, all I can say is that the a List of Dictionaries as suggested above by @JamesBrander is the way you want to go.
It will make incorporating your code much easier.

The only thing I would add would be to use a unique ID to help pull out the exact record that you want:

# Using a list of dictionaries approach 
with arcpy.da.SearchCursor(fc, ["JoinID", "StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror 
    date_info_list = [{"ID": row[0], "start": row[1], "end": row[2]}} for row in scur]

(Use which ever field would be most appropriate instead of Join ID, just used that as it was the only unique ID I could see from your table)

Best of luck!

JamesCrandall
MVP Frequent Contributor

Maybe I'm missing the obvious, but when I run and print the suggested solution I don't see unique dates.  That is, I get this:

[
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  },
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  },
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  }
 ]

 

But I need this:

[
  {
    "start": "2020-05-27 23:44:17",
    "guid": "{A6618411-6CF7-4D7C-9D11-C6F73AC6BC1B}",
    "end": "2020-05-28 05:22:50"
  },  
  {
    "start": "2020-06-04 00:19:40",
    "guid": "{8C6FCDEC-4BF1-4A71-8479-2925952A0993}",
    "end": "2020-06-04 04:23:58"
  }
 ]

 

 

0 Kudos
MichaelBoyce
Esri Contributor

ahh, ok  @JamesCrandall my bad I was leading you down the wrong path.


if you just want a list of unique dates, then use @JamesBrander s original list of dictionaries method and don't worry about the unique ID.

Then convert the list to a set, which will show only unique entries in a list.

 

 

# Using a list of dictionaries approach 
with arcpy.da.SearchCursor(fc, ["StartSuvey", "EndSurvey"]) as scur:
    # Use a list comprehension to pull the data out of the cusror 
    date_info_list = [{"start": row[0], "end": row[1]} for row in scur]
date_info_set = set(date_info_list)

 

 

JamesCrandall
MVP Frequent Contributor

Throws an error:

date_info_set = set(date_info_list)
TypeError: unhashable type: 'dict'

0 Kudos
JeffK
by MVP Regular Contributor
MVP Regular Contributor

dicts aren't hashable, I am not sure that set() would work.

0 Kudos
DanPatterson
MVP Esteemed Contributor

By example... a small table with 10K records pulling out the unique combinations from 3 fields

import arcpy
from arcpy.da import TableToNumPyArray, NumPyArrayToTable
f = r"C:\arcpro_npg\npg\Project_npg\npgeom.gdb\sample_10k"
arr = TableToNumPyArray(f, "*")

flds = arr.dtype.names  # ---- to get all the fields
sub = arr[['County', 'Town_class', 'Facility']]  # --- subsample from 3 fields
sub.shape
(9996,)

uni, idx, cnts = np.unique(sub, True, return_counts=True)
uni.shape
(24,)

uni
array([('A', 'A_', 'Hall'), ('A', 'A_', 'Hosp'), ('A', 'B_', 'Hall'),
       ('A', 'B_', 'Hosp'), ('A', 'C_', 'Hall'), ('A', 'C_', 'Hosp'),
       ('B', 'A_', 'Hall'), ('B', 'A_', 'Hosp'), ('B', 'B_', 'Hall'),
       ('B', 'B_', 'Hosp'), ('B', 'C_', 'Hall'), ('B', 'C_', 'Hosp'),
       ('C', 'A_', 'Hall'), ('C', 'A_', 'Hosp'), ('C', 'B_', 'Hall'),
       ('C', 'B_', 'Hosp'), ('C', 'C_', 'Hall'), ('C', 'C_', 'Hosp'),
       ('D', 'A_', 'Hall'), ('D', 'A_', 'Hosp'), ('D', 'B_', 'Hall'),
       ('D', 'B_', 'Hosp'), ('D', 'C_', 'Hall'), ('D', 'C_', 'Hosp')],
      dtype={'names':['County','Town_class','Facility'], 'formats':['<U4','<U12','<U16'], 'offsets':[8,24,72], 'itemsize':2184})

# ----
# NumPyArrayToTable  =====>  and back to Arc-world

... sort of retired...