Find duplicate by comparing two fields.

CCWeedcontrol · ‎03-23-2017

I need to find duplicates by comparing to fields. I am not sure how to accomplish this so any help would be appreciated.

I have two fields that i need to compare, Account and Acres1 fields. if there is not duplicate then put a "0" in the duplicate field for that record.

The end scenario i need something like this.

Account    Acres   Duplicate
D54963     12.43    0
D54963     12.43    1 (duplicate)
D54963     12.43    2 (duplicate)

D45879     26.32    0 (no duplicate)
D45879     17.32    0 (no duplicate)

D36540     17.51    0
D36540     17.51    1 (duplicate)
D36540     17.51    2 (duplicate)
D36540     17.51    3‍‍‍‍‍‍‍‍‍ (duplicate)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I have the following code but it only checks one field (Account) and it add a number to all attributes if there is a double not exactly what i need.

import arcpy


infeature ="C:/Temp/Test_FINAL_Erase"
field_in ="Account"
field_out ="COUNT_"+field_in

#create the field for the count values
arcpy.AddField_management(infeature,field_out,"SHORT")

#creating the list with all the values in the field, including duplicates
lista=[]
cursor1=arcpy.SearchCursor(infeature)
for row in cursor1:
    i=row.getValue(field_in)    
    lista.append(i)
del cursor1, row

#updating the count field with the number on occurrences of field_in values
#in the previously created list
cursor2=arcpy.UpdateCursor(infeature)
for row in cursor2:
    i=row.getValue(field_in)
    occ=lista.count(i)   
    row.setValue(field_out,occ)
    cursor2.updateRow(row)
del cursor2, row
print ("Done.")
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JoshuaBixby · ‎03-24-2017

Does the following work:

from collections import defaultdict

fc = # path to feature class

dups = defaultdict(int)
with arcpy.da.UpdateCursor(
    fc,
    ["Account", "Acres", "Duplicate"],
    sql_clause=(None, "ORDER BY Account, Acres")
) as cur:
    for acct, acres, _ in cur:
        dups[(acct, acres)] += 1
        cnt = dups[(acct, acres)] - 1
        cur.updateRow([acct, acres, cnt])‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

JoshuaBixby · ‎03-23-2017

Is it possible for a given account number to have multiple, different acre values? For example, could D54963 have 12.43 acres and some other acre value?

CCWeedcontrol · ‎03-23-2017

Joshua yes that is a possibility. Sorry i should have mentioned that.

JayantaPoddar · ‎03-23-2017

Not an answer to your question, but sometimes the floating values might to be difficult to select using "=" operator, because

ArcGIS might interpret a floating value of 12.43 as 12.429999 or 12.430001.

Think Location

JamesCrandall · ‎03-23-2017

fc = r'H:\Documents\ArcGIS\Default.gdb\MyFeatureClass'
fields= ["OBJECTID", "Field1", "Field2"]
with arcpy.da.SearchCursor(fc, fields) as cursor:
 for row in cursor:
  if row[1]==row[2]:
   sql = "OBJECTID=" + str(row[0])
   with arcpy.da.UpdateCursor(fc, "Field3", sql) as ucur:
    for urow in ucur:
     urow[0]=0
     ucur.updateRow(urow)‍‍‍‍‍‍‍‍‍‍

CCWeedcontrol · ‎03-23-2017

Actually i thought it was your code that that Count_Acres1 but it wasn't. The code you posted ran but did not don't do anything.

What i have

# Import the arcpy module  
import arcpy  
  

fc = r'C:Temp/Test_Erase'
fields= ["OBJECTID", "ACCOUNT", "Acres1"]
with arcpy.da.SearchCursor(fc, fields) as cursor:
 for row in cursor:
  if row[1]==row[2]:
   sql = "OBJECTID=" + str(row[0])
   with arcpy.da.UpdateCursor(fc, "Count", sql) as ucur:
    for urow in ucur:
     urow[0]=0
    ucur.updateRow(urow)
print ("Done.")‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

RandyBurton · ‎03-23-2017

In your code in line 11:

   with arcpy.da.UpdateCursor(fc, "Count", sql) as ucur:
‍‍‍

"Count" is an SQL reserved word and should not be used as a field/column name.

JamesCrandall · ‎03-24-2017

Not a complete solution but might be something to look into. numpy and pandas have much better grouping functions. Challenge is to get the result back into your arcpy.da cursor and update the desired column.

nparr = arcpy.da.FeatureClassToNumPyArray(fc,["Account", "Acres"])
pdarr = pd.DataFrame(nparr)
counts = pdarr.groupby(["Account", "Acres"]).size()
pddups = pd.DataFrame(counts, columns = ['size'])
pddups = pddups[pddups.size>1]
print pddups‍‍‍‍‍‍

JoshuaBixby · ‎03-24-2017

Does the following work:

from collections import defaultdict

fc = # path to feature class

dups = defaultdict(int)
with arcpy.da.UpdateCursor(
    fc,
    ["Account", "Acres", "Duplicate"],
    sql_clause=(None, "ORDER BY Account, Acres")
) as cur:
    for acct, acres, _ in cur:
        dups[(acct, acres)] += 1
        cnt = dups[(acct, acres)] - 1
        cur.updateRow([acct, acres, cnt])‍‍‍‍‍‍‍‍‍‍‍‍‍‍

CCWeedcontrol · ‎03-24-2017

Joshua,

Ya i think it worked. Thank you, you saved me a lot of time!

Thanks!