How to delete duplicate rows where multiple values are duplicated?

GeoffreyWest · ‎08-12-2015

I have a feature class where there are multiple values duplicated in a row, i.e. SRNumber and RESOLUTION_CODE. I would like to delete all rows where my SRNumber and RESOLUTION_CODE are found in multiple rows. I have got a good start with this script, everything works aside from actually deleting the rows. How can I fix this so that my duplicate rows are deleted?

So if SRNumber = 1-23456 and Resolution_Code = 'A' and appears twice, I would like to delete one.

I have taken this example for GIS Stackexchange, but the del rows does not work correctly.

import arcpy
... 
... in_table = r"ServiceRequest.DBO.SO_Dead_Animal"
... fields = ["NumberCYLA","RESOLUTION_CODE"] #replace with own
... curStrings = arcpy.da.SearchCursor(in_table,fields,"NumberCYLA is not null")
... 
... stringsKeysList = []
... numbersValuesList = []
... tableDict = {}
... 
... for row in curStrings: 
...     print row[0],row[1]
...     stringsKeysList.append(row[0])
... del curStrings
... uniqueStringsList = list(set(stringsKeysList)) #only unique values from Strings field
... 
... for uniqueString in uniqueStringsList:
...     emptyTempValues = []
...     curValues = arcpy.da.SearchCursor(in_table,fields,"""NumberCYLA = '{0}'""".format(str(uniqueString)))
...     for row in curValues:
...         print row[1]
...         emptyTempValues.append(row[1])
...         uniqueEmptyTempValues = list(set(emptyTempValues))
...     tableDict[uniqueString] = uniqueEmptyTempValues
... 
... print tableDict
... 
... rowsToDelete = [k for k, v in tableDict.iteritems() if v == [-1]] #find which rows to delete
... print "list: ", rowsToDelete
... stringRows = ","
... updRows = stringRows.join(rowsToDelete)
... updRows = updRows.replace(",","','")
... print updRows #update the string to be used with proper syntax in where clause
... 
... where_clause = """NumberCYLA in ('{0}')""".format(updRows)
... upd_cur = arcpy.da.UpdateCursor(in_table,fields,where_clause)
... for row in upd_cur:
...     print row[0],row[1]
...     del rowsToDelete

DarrenWiens2 · ‎08-12-2015

I think you want to call deleteRow() rather than del rowsToDelete, which just deletes the Python variable called "rowsToDelete".

... upd_cur = arcpy.da.UpdateCursor(in_table,fields,where_clause)  
... for row in upd_cur:  
...    upd_cur.deleteRow() # Beware: deletes the current row

JustinHollenbach1 · ‎08-13-2015

You can also just run the Delete Identical tool under Data Management, General. A whole lot easier.

DarrenWiens2 · ‎08-13-2015

Just FYI, Delete Identical requires advanced licensing, while this script requires none.

AdamMarinelli · ‎08-13-2015

Just to clarify the problem you are trying to solve...if this is your attribute table:

OBJECTID*	NumberCYLA	SRNumber	Resolution_Code
1	11	1-23456	A
2	22	1-9987	B
3	33	1-23456	A
4	44	1-5555	C

...you would want the 3rd row deleted because it is a duplicate of row 1 (SRNumber and Resolution_Code match), correct? The code sample you grabbed is not particularly readable (to me) so I just wanted to make sure.

Also, is the field NumberCYLA a unique identifier for your feature class?

GeoffreyWest · ‎08-13-2015

@Adam Marinelli, that is correct, it should be mentioned that NUMBERCYLA and SRNumber are identical. But your example is correct, if I have one Service Request number repeated at different objectIDs, I would like to remove one.

AdamMarinelli · ‎08-14-2015

If you have the proper licensing then definitely use Delete Identical as Justin Hollenbach suggested....there is a tool for everything! Otherwise, here is a much simpler python script which should do what you need. I haven't tested it with a very large feature class so performance could be an issue depending on the # of records in your input datasets.

JoshuaBixby · ‎08-14-2015

The performance of the script can be improved by cutting out the search cursor loop. All of the functional code works within a single update cursor:

import arcpy
fc = #full path to feature class
fields = ['OBJECTID', 'SRNumber', 'Resolution_Code']

table_rows = []
with arcpy.da.UpdateCursor(fc, fields) as cursor:
    for row in cursor:    
        if row[1:] in table_rows:
            print "Deleting record: OBJECTID = {}".format(row[0])
            cursor.deleteRow()
        else:
            table_rows.append(row[1:])
del table_rows