I'm creating a list of feature class with lists of stats of each feature class to identify duplicate feature classes so that I can drop the rest and only copy one over to the a new directory. The items in the list that I'm using to identify uniqueness is file name; number of records in feature class; area (sqm) of minimum bounding# geometry. The file path with file name is also added but a sub list is returned for testing if the values exist within the final list of unique feature classes.
for each feature class:
append a list into an existing list made up of file path, file name, number records, area of minimum bounding geometry.
before adding the next feature class test whether the sub list [file name, number records, area minimum bounding geometry] don't already exist within the list of lists as a sublist (i.e. minus the file path) to prevent duplicate feature classes from being added to the list of lists of feature classes. I've tried using Python built in functions any and all with no luck.
cont_list = []
for fc in focus:
stats_list =
stats_list =
cont_list.append(stats_list)
if stats_list[1:] in cont_list:
print("exists")
else:
cont_list.append(stats_list)
note: all values in stats_list[1:] must match in list of list to be seen as a match to match duplicates. Any advice in how to achieve this will be appreciated.
Solved! Go to Solution.
If you convert your list elements to a tuple and add to a set, you can compare:
>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = set([])
... for fc in fc_list:
... my_tuple = tuple(fc[1:])
... if my_tuple not in master_set:
... print "{} is not a duplicate.".format(my_tuple)
... else:
... print "{} is a duplicate.".format(my_tuple)
... master_set.add(my_tuple)
...
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.
edit: you can also use a plain old list to hold the tuples, although it will still store the duplicates.
>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = []
... for fc in fc_list:
... my_tuple = tuple(fc[1:])
... if my_tuple not in master_set:
... print "{} is not a duplicate.".format(my_tuple)
... else:
... print "{} is a duplicate.".format(my_tuple)
... master_set.append(my_tuple)
...
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.
i don't understand your 3rd and 4th line... what is it with the floating 1 and 2 on those. You should use syntax formatting descriptions are available in several places
Hi Dan,
apologies that my code was not syntax highlighted, I was typing the following last night on my iPad and the GeoNet website doesn't work well with IOS and Safari. Will correct it once I'm in the office.
If you convert your list elements to a tuple and add to a set, you can compare:
>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = set([])
... for fc in fc_list:
... my_tuple = tuple(fc[1:])
... if my_tuple not in master_set:
... print "{} is not a duplicate.".format(my_tuple)
... else:
... print "{} is a duplicate.".format(my_tuple)
... master_set.add(my_tuple)
...
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.
edit: you can also use a plain old list to hold the tuples, although it will still store the duplicates.
>>> fc_list = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],[r'c:\folder2\fcs1', 'fcs1', 100, 200],[r'c:\folder3\fcs2', 'fcs2', 100, 200],[r'c:\folder4\fcs1', 'fcs1', 100, 200],[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
... master_set = []
... for fc in fc_list:
... my_tuple = tuple(fc[1:])
... if my_tuple not in master_set:
... print "{} is not a duplicate.".format(my_tuple)
... else:
... print "{} is a duplicate.".format(my_tuple)
... master_set.append(my_tuple)
...
('fcs1', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs2', 100, 200) is not a duplicate.
('fcs1', 100, 200) is a duplicate.
('fcs1', 100, 999) is not a duplicate.
Thanks Darren,
This will work perfectly. I'll post my code once I've completed for everyone else to see what I was trying to achieve.
Forgetting your array stuff
verbose demo
a = [[r'c:\folder1\fcs1', 'fcs1', 100, 200],
[r'c:\folder2\fcs1', 'fcs1', 100, 200],
[r'c:\folder3\fcs2', 'fcs2', 100, 200],
[r'c:\folder4\fcs1', 'fcs1', 100, 200],
[r'c:\folder5\fcs1', 'fcs1', 100, 999]]
a = [tuple(i) for i in a]
dt = [("A", "U50"), ("B", "U10"), ("C", '<i4'), ("D", '<i4')]
a = np.array(a, dtype=dt)
uni, idx = np.unique(a[['B', 'C', 'D']], return_index=True)
idx.sort()
a_rows = a[idx]
frmt = """
input array
{!r:}
Unique from cols B,C,D
{}
indices {}
Unique rows
{}
"""
print(frmt.format(a, uni, idx, a_rows))
yielding
input array
array([('c:\\folder1\\fcs1', 'fcs1', 100, 200),
('c:\\folder2\\fcs1', 'fcs1', 100, 200),
('c:\\folder3\\fcs2', 'fcs2', 100, 200),
('c:\\folder4\\fcs1', 'fcs1', 100, 200),
('c:\\folder5\\fcs1', 'fcs1', 100, 999)],
dtype=[('A', '<U50'), ('B', '<U10'), ('C', '<i4'), ('D', '<i4')])
Unique from cols B,C,D
[('fcs1', 100, 200) ('fcs1', 100, 999) ('fcs2', 100, 200)]
indices [0 2 4]
Unique rows
[('c:\\folder1\\fcs1', 'fcs1', 100, 200) ('c:\\folder3\\fcs2', 'fcs2', 100, 200)
('c:\\folder5\\fcs1', 'fcs1', 100, 999)]
Basically...
Comments...