Validate if all items in list1 is in list of lists as sub list?

PeterWilson · ‎11-23-2016

I'm creating a list of feature class with lists of stats of each feature class to identify duplicate feature classes so that I can drop the rest and only copy one over to the a new directory. The items in the list that I'm using to identify uniqueness is file name; number of records in feature class; area (sqm) of minimum bounding# geometry. The file path with file name is also added but a sub list is returned for testing if the values exist within the final list of unique feature classes.

for each feature class:

append a list into an existing list made up of file path, file name, number records, area of minimum bounding geometry.

before adding the next feature class test whether the sub list [file name, number records, area minimum bounding geometry] don't already exist within the list of lists as a sublist (i.e. minus the file path) to prevent duplicate feature classes from being added to the list of lists of feature classes. I've tried using Python built in functions any and all with no luck.

cont_list = []

for fc in focus:

stats_list = 1

stats_list = 2

cont_list.append(stats_list)

if stats_list[1:] in cont_list:

print("exists")

else:

cont_list.append(stats_list)

note: all values in stats_list[1:] must match in list of list to be seen as a match to match duplicates. Any advice in how to achieve this will be appreciated.

DarrenWiens2 · ‎11-23-2016

If you convert your list elements to a tuple and add to a set, you can compare:

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = set([])
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.add(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

edit: you can also use a plain old list to hold the tuples, although it will still store the duplicates.

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = []
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.append(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

DanPatterson_Retired · ‎11-23-2016

i don't understand your 3rd and 4th line... what is it with the floating 1 and 2 on those. You should use syntax formatting descriptions are available in several places

PeterWilson · ‎11-23-2016

Hi Dan,

apologies that my code was not syntax highlighted, I was typing the following last night on my iPad and the GeoNet website doesn't work well with IOS and Safari. Will correct it once I'm in the office.

DarrenWiens2 · ‎11-23-2016

If you convert your list elements to a tuple and add to a set, you can compare:

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = set([])
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.add(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

edit: you can also use a plain old list to hold the tuples, although it will still store the duplicates.

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = []
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.append(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

PeterWilson · ‎11-23-2016

Thanks Darren,

This will work perfectly. I'll post my code once I've completed for everyone else to see what I was trying to achieve.

DanPatterson_Retired · ‎11-24-2016

Forgetting your array stuff

verbose demo

a = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],
     [r'c:\folder2\fcs1', 'fcs1', 100, 200],
     [r'c:\folder3\fcs2', 'fcs2', 100, 200],
     [r'c:\folder4\fcs1', 'fcs1', 100, 200],
     [r'c:\folder5\fcs1', 'fcs1', 100, 999]]
a = [tuple(i) for i in a]
dt = [("A", "U50"), ("B", "U10"), ("C", '<i4'), ("D", '<i4')]
a = np.array(a, dtype=dt)
uni, idx = np.unique(a[['B', 'C', 'D']], return_index=True)
idx.sort()
a_rows = a[idx]
frmt = """
input array
{!r:}
Unique from cols B,C,D
  {}
indices {}
Unique rows
{}
"""
print(frmt.format(a, uni, idx, a_rows))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

yielding

input array
array([('c:\\folder1\\fcs1', 'fcs1', 100, 200),
       ('c:\\folder2\\fcs1', 'fcs1', 100, 200),
       ('c:\\folder3\\fcs2', 'fcs2', 100, 200),
       ('c:\\folder4\\fcs1', 'fcs1', 100, 200),
       ('c:\\folder5\\fcs1', 'fcs1', 100, 999)], 
      dtype=[('A', '<U50'), ('B', '<U10'), ('C', '<i4'), ('D', '<i4')])
Unique from cols B,C,D
  [('fcs1', 100, 200) ('fcs1', 100, 999) ('fcs2', 100, 200)]
indices [0 2 4]
Unique rows
[('c:\\folder1\\fcs1', 'fcs1', 100, 200) ('c:\\folder3\\fcs2', 'fcs2', 100, 200)
 ('c:\\folder5\\fcs1', 'fcs1', 100, 999)]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Basically...

list of lists to list of tuples like for set
convert to an array, with an appropriate dtype to suit
find the unique combinations in key columns (like set all at once)
sort the indices and slice the array using them
add print fluff or NumPyArrayToTable to get back to Arc*

Comments...

Lines 1 to 8 can be skipped if you use TableToNumPyArray (or FeatureClassToNumPyArray).
The 'key' to getting the unique records is to specify the fields that you want to 'slice' the data on unique combinations.
If you only need the unique entries, you can get away with lines 6 to 9.
If you want to slice the whole input, then add 10 and 11