How to delete duplicate rows where multiple values are duplicated?

4610
7
08-12-2015 05:06 PM
GeoffreyWest
Occasional Contributor III

I have a feature class where there are multiple values duplicated in a row, i.e. SRNumber and RESOLUTION_CODE. I would like to delete all rows where my SRNumber and RESOLUTION_CODE are found in multiple rows. I have got a good start with this script, everything works aside from actually deleting the rows. How can I fix this so that my duplicate rows are deleted?

So if SRNumber = 1-23456 and Resolution_Code = 'A' and appears twice, I would like to delete one.

I have taken this example for GIS Stackexchange, but the del rows does not work correctly.

import arcpy
... 
... in_table = r"ServiceRequest.DBO.SO_Dead_Animal"
... fields = ["NumberCYLA","RESOLUTION_CODE"] #replace with own
... curStrings = arcpy.da.SearchCursor(in_table,fields,"NumberCYLA is not null")
... 
... stringsKeysList = []
... numbersValuesList = []
... tableDict = {}
... 
... for row in curStrings: 
...     print row[0],row[1]
...     stringsKeysList.append(row[0])
... del curStrings
... uniqueStringsList = list(set(stringsKeysList)) #only unique values from Strings field
... 
... for uniqueString in uniqueStringsList:
...     emptyTempValues = []
...     curValues = arcpy.da.SearchCursor(in_table,fields,"""NumberCYLA = '{0}'""".format(str(uniqueString)))
...     for row in curValues:
...         print row[1]
...         emptyTempValues.append(row[1])
...         uniqueEmptyTempValues = list(set(emptyTempValues))
...     tableDict[uniqueString] = uniqueEmptyTempValues
... 
... print tableDict
... 
... rowsToDelete = [k for k, v in tableDict.iteritems() if v == [-1]] #find which rows to delete
... print "list: ", rowsToDelete
... stringRows = ","
... updRows = stringRows.join(rowsToDelete)
... updRows = updRows.replace(",","','")
... print updRows #update the string to be used with proper syntax in where clause
... 
... where_clause = """NumberCYLA in ('{0}')""".format(updRows)
... upd_cur = arcpy.da.UpdateCursor(in_table,fields,where_clause)
... for row in upd_cur:
...     print row[0],row[1]
...     del rowsToDelete
0 Kudos
7 Replies
DarrenWiens2
MVP Honored Contributor

I think you want to call deleteRow() rather than del rowsToDelete, which just deletes the Python variable called "rowsToDelete".

... upd_cur = arcpy.da.UpdateCursor(in_table,fields,where_clause)  
... for row in upd_cur:  
...    upd_cur.deleteRow() # Beware: deletes the current row
JustinHollenbach1
New Contributor II

You can also just run the Delete Identical tool under Data Management, General. A whole lot easier.

DarrenWiens2
MVP Honored Contributor

Just FYI, Delete Identical requires advanced licensing, while this script requires none.

AdamMarinelli
Occasional Contributor

Just to clarify the problem you are trying to solve...if this is your attribute table:

OBJECTID*NumberCYLASRNumberResolution_Code
1111-23456A
2221-9987B
3331-23456A
4441-5555C

...you would want the 3rd row deleted because it is a duplicate of row 1 (SRNumber and Resolution_Code match), correct?  The code sample you grabbed is not particularly readable (to me) so I just wanted to make sure.

Also, is the field NumberCYLA a unique identifier for your feature class?

0 Kudos
GeoffreyWest
Occasional Contributor III

@Adam Marinelli, that is correct, it should be mentioned that NUMBERCYLA and SRNumber are identical.  But your example is correct, if I have one Service Request number repeated at different objectIDs, I would like to remove one.

0 Kudos
AdamMarinelli
Occasional Contributor

If you have the proper licensing then definitely use Delete Identical as Justin Hollenbach​ suggested....there is a tool for everything!  Otherwise, here is a much simpler python script which should do what you need.  I haven't tested it with a very large feature class so performance could be an issue depending on the # of records in your input datasets.

JoshuaBixby
MVP Esteemed Contributor

The performance of the script can be improved by cutting out the search cursor loop.  All of the functional code works within a single update cursor:

import arcpy
fc = #full path to feature class
fields = ['OBJECTID', 'SRNumber', 'Resolution_Code']

table_rows = []
with arcpy.da.UpdateCursor(fc, fields) as cursor:
    for row in cursor:    
        if row[1:] in table_rows:
            print "Deleting record: OBJECTID = {}".format(row[0])
            cursor.deleteRow()
        else:
            table_rows.append(row[1:])
del table_rows