Validate if all items in list1 is in list of lists as sub list?

3741
5
Jump to solution
11-23-2016 03:12 PM
PeterWilson
Regular Contributor

I'm creating a list of feature class with lists of stats of each feature class to identify duplicate feature classes so that I can drop the rest and only copy one over to the a new directory. The items in the list that I'm using to identify uniqueness is file name; number of records in feature class; area (sqm) of minimum bounding# geometry. The file path with file name is also added but a sub list is returned for testing if the values exist within the final list of unique feature classes.

for each feature class:

append a list into an existing list made up of file path, file name, number records, area of minimum bounding geometry.

before adding the next feature class test whether the sub list [file name, number records, area minimum bounding geometry] don't already exist within the list of lists as a sublist (i.e. minus the file path) to prevent duplicate feature classes from being added to the list of lists of feature classes. I've tried using Python built in functions any and all with no luck. 

cont_list = []

for fc in focus:

  stats_list = 1

  stats_list = 2

  cont_list.append(stats_list)

  if stats_list[1:] in cont_list:

    print("exists")

  else:

    cont_list.append(stats_list)

note: all values in stats_list[1:] must match in list of list to be seen as a match to match duplicates. Any advice in how to achieve this will be appreciated.

0 Kudos
1 Solution

Accepted Solutions
DarrenWiens2
MVP Honored Contributor

If you convert your list elements to a tuple and add to a set, you can compare:

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = set([])
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.add(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

edit: you can also use a plain old list to hold the tuples, although it will still store the duplicates.

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = []
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.append(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.

View solution in original post

5 Replies
DanPatterson_Retired
MVP Esteemed Contributor

i don't understand your 3rd and 4th line... what is it with the floating 1 and 2 on those.  You should use syntax formatting descriptions are available in several places

0 Kudos
PeterWilson
Regular Contributor

Hi Dan,

apologies that my code was not syntax highlighted, I was typing the following last night on my iPad and the GeoNet website doesn't work well with IOS and Safari. Will correct it once I'm in the office.

0 Kudos
DarrenWiens2
MVP Honored Contributor

If you convert your list elements to a tuple and add to a set, you can compare:

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = set([])
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.add(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

edit: you can also use a plain old list to hold the tuples, although it will still store the duplicates.

>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = []
... for fc in fc_list:
...     my_tuple = tuple(fc[1:])
...     if  my_tuple not in master_set:
...         print "{} is not a duplicate.".format(my_tuple)
...     else:
...         print "{} is a duplicate.".format(my_tuple)
...     master_set.append(my_tuple)
...     
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.
PeterWilson
Regular Contributor

Thanks Darren,

This will work perfectly. I'll post my code once I've completed for everyone else to see what I was trying to achieve.

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Forgetting your array stuff  

verbose demo

a = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],
     [r'c:\folder2\fcs1', 'fcs1', 100, 200],
     [r'c:\folder3\fcs2', 'fcs2', 100, 200],
     [r'c:\folder4\fcs1', 'fcs1', 100, 200],
     [r'c:\folder5\fcs1', 'fcs1', 100, 999]]
a = [tuple(i) for i in a]
dt = [("A", "U50"), ("B", "U10"), ("C", '<i4'), ("D", '<i4')]
a = np.array(a, dtype=dt)
uni, idx = np.unique(a[['B', 'C', 'D']], return_index=True)
idx.sort()
a_rows = a[idx]
frmt = """
input array
{!r:}
Unique from cols B,C,D
  {}
indices {}
Unique rows
{}
"""
print(frmt.format(a, uni, idx, a_rows))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

yielding

input array
array([('c:\\folder1\\fcs1', 'fcs1', 100, 200),
       ('c:\\folder2\\fcs1', 'fcs1', 100, 200),
       ('c:\\folder3\\fcs2', 'fcs2', 100, 200),
       ('c:\\folder4\\fcs1', 'fcs1', 100, 200),
       ('c:\\folder5\\fcs1', 'fcs1', 100, 999)], 
      dtype=[('A', '<U50'), ('B', '<U10'), ('C', '<i4'), ('D', '<i4')])
Unique from cols B,C,D
  [('fcs1', 100, 200) ('fcs1', 100, 999) ('fcs2', 100, 200)]
indices [0 2 4]
Unique rows
[('c:\\folder1\\fcs1', 'fcs1', 100, 200) ('c:\\folder3\\fcs2', 'fcs2', 100, 200)
 ('c:\\folder5\\fcs1', 'fcs1', 100, 999)]‍‍‍‍‍‍‍‍‍‍‍‍‍

Basically...

  • list of lists to list of tuples like for set
  • convert to an array, with an appropriate dtype to suit 
  • find the unique combinations in key columns (like set all at once)
  • sort the indices and slice the array using them
  • add print fluff or NumPyArrayToTable to get back to Arc*

Comments...

  • Lines 1 to 8 can be skipped if you use TableToNumPyArray (or FeatureClassToNumPyArray).
  • The 'key' to getting the unique records is to specify the fields that you want to 'slice' the data on unique combinations.
  • If you only need the unique entries, you can get away with lines 6 to 9.  
  • If you want to slice the whole input, then add 10 and 11