I need to be able to delete duplicate records but keep one only if it has a certain attribute in a certain field in a table. so i need to keep one of each "Acct" row but only if it has "01" in the Group _code. I need to be able to do this in arcpy python because i get this table weekly so i would like to schedule this task. Attached is a pic of the table i am working with.
I think i can iterate the "group_code" field and append the value and "Acct" to a dictionary.
I have the following but i am not sure how to select and remove duplicates. I would appreciate some help with some code.
import arcpy
from arcpy import env
env.overwriteOutput = 1
env.workspace = r"C:\GIS\LandValue.gdb"
fc = "LandValue3"
dict = {}
with arcpy.da.SearchCursor(fc, ["Acct", "group_code"]) as cursor:
for row in cursor:
dict[row[0]] = row[1]
del cursor
Solved! Go to Solution.
I haven't tested the following, but I think it will work. I think using a Python set is the best approach since the lookups are quicker than lists, although you could use a dictionary as well. I added the extra "01" check because it seemed like you only want to look at records with that group code. Also, I assumed the group code are strings.
import arcpy
from arcpy import env
env.overwriteOutput = 1
env.workspace = r"C:\GIS\LandValue.gdb"
fc = "LandValue3"
accts = set()
with arcpy.da.UpdateCursor(fc, ["Acct", "group_code"]) as cursor:
for acct, group_code in cursor:
if group_code == "01":
if acct in accts:
cursor.deleteRow()
else:
accts.add(acct)
del cursor
I haven't tested the following, but I think it will work. I think using a Python set is the best approach since the lookups are quicker than lists, although you could use a dictionary as well. I added the extra "01" check because it seemed like you only want to look at records with that group code. Also, I assumed the group code are strings.
import arcpy
from arcpy import env
env.overwriteOutput = 1
env.workspace = r"C:\GIS\LandValue.gdb"
fc = "LandValue3"
accts = set()
with arcpy.da.UpdateCursor(fc, ["Acct", "group_code"]) as cursor:
for acct, group_code in cursor:
if group_code == "01":
if acct in accts:
cursor.deleteRow()
else:
accts.add(acct)
del cursor
Nice approach Joshua. Cool use of a set.
Although what you want can be done purely in Python, sometimes mixing Python and SQL makes sense. I have updated the code to add the third column and sort the cursor before processing it. Sorting value in descending order should ensure the first record found, which will be the one that is kept, will have the highest value.
import arcpy
from arcpy import env
env.overwriteOutput = 1
env.workspace = r"C:\GIS\LandValue.gdb"
fc = "LandValue3"
accts = set()
postfix = "ORDER BY group_code, acct, value desc"
with arcpy.da.UpdateCursor(fc, ["Acct", "group_code", "value"],
sql_clause=(None, postfix)) as cursor:
for acct, group_code, value in cursor:
if group_code == "01":
if acct in accts:
cursor.deleteRow()
else:
accts.add(acct)
del cursor
Joshua thanks for the replay and updated code, but when i run the code it runs fun i don't get any error's but nothing happens to the table, no records are delete.
I will have to do some tinkering later today and see what is going on.
I am not sure what is going on. I just recreated the table from your screenshot and ran my code snippet against it, and everything worked as expected. The table I recreated was pruned from 29 records to 13 records.
What version of ArcGIS are you running? Are you running the script in the interactive Python window or from an IDE or standalone interpreter? What is the backend storage format, file geodatabase?
I am running 10.5 PreRelease. When I run the code in 2.7.10 IDLE using the attached/uploaded data, it works, removes about 50 or so records (honestly, I wasn't sure it was going to work with the data in DBF files but it did). The only thing I can think about is a bug in 10.3.1 that has been fixed since.
What bug are you referring to, is there a patch for this bug? any other way around this?