delete duplicates records based on field

5964
15
Jump to solution
10-13-2016 08:28 AM
2Quiker
Occasional Contributor II

I need to be able to delete duplicate records but keep one only if it has a certain attribute in a certain field in a table. so i need to keep one of each "Acct" row but only if it has "01" in the Group _code. I need to be able to do this in arcpy python because i get this table weekly so i would like to schedule this task. Attached is a pic of the table i am working with.

I think i can iterate  the "group_code" field and append the value and "Acct" to a dictionary.

I have the following but i am not sure how to select and remove duplicates. I would appreciate some help with some code.

import arcpy  
from arcpy import env  
env.overwriteOutput = 1  
env.workspace = r"C:\GIS\LandValue.gdb"  
  
fc = "LandValue3"  
  
dict = {}  
  
with arcpy.da.SearchCursor(fc, ["Acct", "group_code"]) as cursor:  
    for row in cursor:  
        dict[row[0]] = row[1]  
  
del cursor
0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

I haven't tested the following, but I think it will work.  I think using a Python set is the best approach since the lookups are quicker than lists, although you could use a dictionary as well.  I added the extra "01" check because it seemed like you only want to look at records with that group code.  Also, I assumed the group code are strings.

import arcpy  
from arcpy import env  
env.overwriteOutput = 1  
env.workspace = r"C:\GIS\LandValue.gdb"  
  
fc = "LandValue3"  
  
accts = set()
  
with arcpy.da.UpdateCursor(fc, ["Acct", "group_code"]) as cursor:  
    for acct, group_code in cursor:  
        if group_code == "01":
            if acct in accts:
                cursor.deleteRow()
            else:
                accts.add(acct)
  
del cursor

View solution in original post

15 Replies
JoshuaBixby
MVP Esteemed Contributor

I haven't tested the following, but I think it will work.  I think using a Python set is the best approach since the lookups are quicker than lists, although you could use a dictionary as well.  I added the extra "01" check because it seemed like you only want to look at records with that group code.  Also, I assumed the group code are strings.

import arcpy  
from arcpy import env  
env.overwriteOutput = 1  
env.workspace = r"C:\GIS\LandValue.gdb"  
  
fc = "LandValue3"  
  
accts = set()
  
with arcpy.da.UpdateCursor(fc, ["Acct", "group_code"]) as cursor:  
    for acct, group_code in cursor:  
        if group_code == "01":
            if acct in accts:
                cursor.deleteRow()
            else:
                accts.add(acct)
  
del cursor
NeilAyres
MVP Alum

Nice approach Joshua. Cool use of a set.

0 Kudos
2Quiker
Occasional Contributor II

Darn i just realized that that i need it to keep just one of the Acct with 01 in the group_code but the one with the highest value in the "value" field.

As you can see on the attached table there might be duplicate values.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Although what you want can be done purely in Python, sometimes mixing Python and SQL makes sense.  I have updated the code to add the third column and sort the cursor before processing it.  Sorting value in descending order should ensure the first record found, which will be the one that is kept, will have the highest value.

import arcpy  
from arcpy import env  
env.overwriteOutput = 1  
env.workspace = r"C:\GIS\LandValue.gdb"  
  
fc = "LandValue3"  
  
accts = set()
postfix = "ORDER BY group_code, acct, value desc"
  
with arcpy.da.UpdateCursor(fc, ["Acct", "group_code", "value"],
                           sql_clause=(None, postfix)) as cursor:  
    for acct, group_code, value in cursor:  
        if group_code == "01":
            if acct in accts:
                cursor.deleteRow()
            else:
                accts.add(acct)
  
del cursor
2Quiker
Occasional Contributor II

Joshua thanks for the replay and updated code, but when i run the code it runs fun i don't get any error's but nothing happens to the table, no records are delete.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I will have to do some tinkering later today and see what is going on.

I am not sure what is going on.  I just recreated the table from your screenshot and ran my code snippet against it, and everything worked as expected.  The table I recreated was pruned from 29 records to 13 records.

What version of ArcGIS are you running?  Are you running the script in the interactive Python window or from an IDE or standalone interpreter?  What is the backend storage format, file geodatabase? 

0 Kudos
2Quiker
Occasional Contributor II

I am on ArcGIS 10.3.1, i am runign the script on python 2.7.8 IDLE, the table is in a file geodatabase.

I have attached a portion of table i am working with. Thanks.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I am running 10.5 PreRelease.  When I run the code in 2.7.10 IDLE using the attached/uploaded data, it works, removes about 50 or so records (honestly, I wasn't sure it was going to work with the data in DBF files but it did).  The only thing I can think about is a bug in 10.3.1 that has been fixed since.

0 Kudos
2Quiker
Occasional Contributor II

What bug are you referring to, is there a patch for this bug? any other way around this?

0 Kudos