I have a feature class of polylines that are duplicate, one on top of the other and i would like to delete all except for 1 and it doesn't matter witch one just as long as there is one left. I would like this to be done in arcmap.
I have the following that identifies the duplicates but i am not sure on how to delete duplicates in the set.
import itertools, arcpy
arcpy.env.overwriteOutput = True
mxd = arcpy.mapping.MapDocument("CURRENT")
df = arcpy.mapping.ListDataFrames(mxd, "Layers")[0]
lyr = arcpy.mapping.ListLayers(mxd, "Parcel_Lines1")[0]
dsc = arcpy.Describe(lyr)
sr = dsc.spatialReference
oid_field_name = dsc.oidFieldName
# get a cursor on the input features
rows1 = arcpy.SearchCursor(lyr)
# exclude features already compared once
exclude = []
# iterate through the first coursor
for row1 in rows1:
oid1 = row1.getValue(oid_field_name)
shp1 = row1.shape
# get a second cursor on the same input features
rows2 = arcpy.SearchCursor(lyr)
# add the feature to be compared to exclude list
exclude.append(oid1)
# create a set to hold duplicate features
group = set()
# iterate through the second cursor
for row2 in rows2:
oid2 = row2.getValue(oid_field_name)
shp2 = row2.shape
# ignore features already compared
if oid2 in exclude:
continue
# test equality
if shp1.equals(shp2):
# add both feature ids to the set of identical features
group.add(oid1)
group.add(oid2)
# add the feature just compared to the exclude list
exclude.append(oid2)
if group: # if the group is not empty
print group
If you have the Advanced license, might be worth looking into Delete Identical—Help | ArcGIS Desktop
I don't, which is why i was looking into trying to do it with python.
Take a look at the Geometry Class for ArcPy. It's makes it possible to do some some creative process's on geometry.
So for example if none of the geometry touched each other except for the matching geometry you could write a script to delete only a single geometry from the ones that are touching.
Have a look:
A couple of comments. First and foremost, you are using the older/legacy cursors, I suggest you move to the newer/data-access cursors. Not only are the Data Access cursors more Pythonic, they perform much better.
In terms of workflow, if you already have a list of OIDs that represent duplicates, which you appear to, then just setup an update cursor after you are done with your search cursors and delete any record whose OID is in your list.