Find Duplicates records

2818
8
05-09-2017 02:54 PM
CCWeedcontrol
Occasional Contributor III

 I need to be able to add increments to duplicates in the P_ID field like below with arcpy. No sure how to though.

P_ID 23560 -->0

P_ID 23560 -->1

P_ID 23560 -->2

P_ID 23560 -->3

P_ID 23560 -->4

P_ID 23571 -->0

P_ID 23571 -->1

P_ID 23571 -->2.

 I am working with the following code that adds 'Y' or 'No'  in search of duplicates. Duplicates get a 'Y' and none Duplicates get 'N'.

import arcpy

#find Duplicate records, add Y or N in updateFeild

inShapefile = 'VacantLots'
checkField = "P_ID"
updateField = "Count"

with arcpy.da.SearchCursor(inShapefile, [checkField]) as rows:
    values = [r[0] for r in rows]

d = {}
for item in set(values):
    if values.count(item) > 1:
        d[item] = 'Y'
    else:
        d[item] = 'N'

with arcpy.da.UpdateCursor(inShapefile, [checkField, updateField]) as rows:
    for row in rows:
        if row[0] in d:
            row[1] = d[row[0]]
            rows.updateRow(row)
0 Kudos
8 Replies
JoshuaBixby
MVP Esteemed Contributor

Several variations of this question have been asked in the past 6, maybe 12, months.  I can't recall all of those other questions/discussions, but https://community.esri.com/thread/192311-find-duplicate-by-comparing-two-fields is one of them.

A quick question though.  Since a group of records all share the same P_ID, how do you know which one is the original and which are the duplicates?  Is the lowest ObjectID/FID the original?

CCWeedcontrol
Occasional Contributor III

Good point about which one is the original ID but for this case i don't need to actually know which is the original ID. It's a notification list to mail out notices and there is numerous double parcels. I need to remove the duplicates so when staff mails out letters their not sending out 3-7 letters to the same address.

I have ran into the case where people have requested every road name in our county. Our road center line is broken into numerous segments so i need to identify duplicates so this would be something i could use a number of ways.

0 Kudos
RandyBurton
MVP Alum

An old thread that might help: Identify duplicate records & increment

RandyBurton
MVP Alum

This uses Arkadiusz Matoszka‌'s suggestion (from the link I posted above).  It uses a dictionary in a manner similar to what you were trying.  But it can run inside the UpdateCursor block, so you don't need the SearchCursor code.  (Also, "Count" might not be a good field name as it may be an SQL reserved word.)

import arcpy

# find duplicate records and count them in dictionary

inShapefile = 'VacantLots'
checkField = "P_ID"
updateField = "fieldCount" # renamed field

d = {} # dictionary for counting

with arcpy.da.UpdateCursor(inShapefile, [checkField, updateField]) as rows:
    for row in rows:

        if row[0] not in d.keys():
            d[row[0]] = 0 # insert key into dictionary and set value to 0
            row[1] = 0 # value into updateField

        else:
            d[row[0]] += 1 #increment value in dictionary
            row[1] = str(d[row[0]]) # save value in updateField (str for text field)

        rows.updateRow(row)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
CCWeedcontrol
Occasional Contributor III

I tried the code you posted but everything in the filed was just '0'. I did look at the link you posted and I had something similar to this but i had the same results

0 Kudos
RandyBurton
MVP Alum

Are you working with a shape file in ArcMap's python window?  If so, I think you need to be in an edit session for the shape file to be updated.  Or are you working with a feature in a file geodatabase?

0 Kudos
CCWeedcontrol
Occasional Contributor III

I tried it both ways i get the '0's'.

0 Kudos
RandyBurton
MVP Alum

Can you post your revised code?

0 Kudos