How To: Identify duplicate or unique values in ArcGIS Pro-- Update

3024
18
Jump to solution
02-25-2021 09:17 AM
JaredPilbeam2
MVP Regular Contributor

https://support.esri.com/en/technical-article/000023355

I want to improve on the above script from ESRI that finds duplicate values and writes the values to a new field in a table. It's using an old cursor and it is very slow with large datasets. But, I'm running into problems due to lack of experience.

Their code verbatim for my purposes:

import arcpy

'''
This script will count the number of occurences of a value in a field ("field_in") and write them to a new field ("field_out")
'''
#path to GDB goes here
arcpy.env.workspace = r"C:\pathto\DuplicateTesting.gdb\DuplicateTesting" 
#name of feature class goes here
infeature = "backup_02232021" 
field_in = "name" 
field_out = "COUNT_"+field_in
arcpy.AddField_management(infeature, field_out,"SHORT")

lista= []
cursor1=arcpy.SearchCursor(infeature)
for row in cursor1:
    i=row.getValue(field_in) 
    lista.append(i)
del cursor1, row

cursor2=arcpy.UpdateCursor(infeature)
for row in cursor2:
    i=row.getValue(field_in)
    occ=lista.count(i)
    row.setValue(field_out, occ)
    cursor2.updateRow(row)
del cursor2, row

 

The following is as far as I got with my improvement. It's throwing an error on the line with the i variable. How can I continue with the UpdateCursor method to put the count of each item in the fields list into its corresponding new field?

infeature = r"C:pathto\DuplicateTesting.gdb\DuplicateTesting
fields = ["name", "location_string_output", "email_address", "cell_phone_number", 
          "home_phone_number"] 
for field_out in fields: #add new fields
    arcpy.AddField_management(infeature, "COUNT_"+field_out,"SHORT")
    
list= []
with arcpy.da.SearchCursor(infeature, fields) as cursor1:
    for row in cursor1:
        list.append(row)
    del cursor1, row
with arcpy.da.UpdateCursor(infeature, fields) as cursor2:
    for row in cursor2:
        i = row.getValue(row) #<-- throws AttributeError here
        occ = list.count(i)
        print(occ)
    del cursor2, row

 

Error:

Traceback (most recent call last):
  File "\\gisfile\GISstaff\Jared\Python Scripts\ArcGISPro\DuplicateFields_updated.py", line 21, in <module>
    i = row.getValue(row)
AttributeError: 'list' object has no attribute 'getValue'

 

Tags (3)
0 Kudos
18 Replies
DanPatterson
MVP Esteemed Contributor
0 Kudos
JaredPilbeam2
MVP Regular Contributor

So I guess where I'm still stuck is getting the count back into the new fields. I left the cursor2 block at the occ variable.

infeature = r"C:\pathto\backup_02232021_TESTDELETE" #name of feature class goes here
fields = ["name", "location_string_output", "email_address", "cell_phone_number", 
          "home_phone_number"] 
for field_out in fields: #add new fields
    arcpy.AddField_management(infeature, "COUNT_"+field_out,"SHORT")
    
values_list= []
with arcpy.da.SearchCursor(infeature, fields) as cursor1:
    for row in cursor1:
        #access each field value by the index i then append
        for i in range(0,len(fields)):
            #you access a value by a row index, this starts at 0 from the
            #first field input - i.e. "name" field values are at row[0]
            values_list.append(row[i])

with arcpy.da.SearchCursor(infeature, fields) as cursor2:
    for row in cursor2:
        for i in range(0,len(fields)):
            #get the value of the field in the row
            value = row[i] 
            #count how often it appears in values_list
            occ = values_list.count(value)

 

I'm not sure how to use the below code you provided to update the new fields with the value count?

fields = ["name", "location_string_output", "email_address", "cell_phone_number", 
          "home_phone_number"]

#list to contain the new fields
new_fields = []
for field in fields:
     field_out = "COUNT_" + field
     #add the fields to all_fields
     new_fields.append(field_out)

#list to contain all fields, this is used as fields list
#in the update cursor - to do this we just add the lists together!
all_fields = fields + new_fields

 

0 Kudos
DavidPike
MVP Frequent Contributor

I might in-fact be tempted to create a dictionary of values to the number of occurrences, then just map those across into the update cursor - rather than having to run .count() each time.

0 Kudos
DanPatterson
MVP Esteemed Contributor

if that is what you got when clicking a link, then you some serious malware on your computer 

 

https:// quit spamming others

 

 is the link from above


... sort of retired...
0 Kudos
by Anonymous User
Not applicable

getValue needs an indexed position, but you are giving it the whole list. 

If it's just one field: i = row.getValue(row[0]), but you have a list of fields so you need to iterate over them to get all the name and get the count.

Since you are not updating anything you can use a searchcursor instead of the update.

with arcpy.da.SearchCursor(infeature, fields) as cursor2:
    for row in cursor2:
        for fld in row:
            fldname = row.getValue(fld) # needs indexed position here
            occ = list.count(fldname)
by Anonymous User
Not applicable

Instead of trying to do the whole list of fields at once, how about iterating over each field and passing it into the cursors?

edited for refractoring the cursor into list comprehension and removing the r in the string.

 

excluded = ['OBJECTID', 'Shape', 'Shape_Length', 'Shape_Area'] # exclude fields if you want

# set the fieldnames
infeatureflds = [f.name for f in arcpy.ListFields(infeature) if f.name not in excluded]

for fld in infeatureflds:
    outfld = f"{fld}_cnts"
    arcpy.AddField_management(infeature, outfld, "SHORT")

    listValues = [row[0] for row in arcpy.da.SearchCursor(infeature, fld)]

    with arcpy.da.UpdateCursor(infeature, [fld, outfld]) as uCur:
        for row in uCur:
            row[1] = listValues.count(row[0])
            uCur.updateRow(row)

print("Done.")

 

 

DavidPike
MVP Frequent Contributor

Perfect idea.

0 Kudos
JaredPilbeam2
MVP Regular Contributor

I'm at the mercy of you experts. I tried that and it worked flawlessly at first glance. I have a bunch of fields that don't need counted, so that excluded list is great.

Is this a type of f-string?

fr"{fld}_cnts"

 

I'm giving you guys all the credit for updating this little program. Thank you very much!

0 Kudos
by Anonymous User
Not applicable

The 'f' is python 3 shorthand for .format().  I have a habit of always adding the 'r' in front of strings and it isn't necessary in this case.

Also, you can collapse that first searchcursor that is appending the values to the list to just:

listValues = [row[0] for row in arcpy.da.SearchCursor(infeature, fld)]