Find Duplicates and increment

5041
10
04-10-2011 08:25 PM
DavidBrett
New Contributor III
Hello,

I'm attempting to find the duplicates in a field, then number the results sequentially within each duplicate set.  I've managed increment the duplicates, but not start over when the next duplicate is hit in the code.

At 9.3, I was able to use the field_Mark_Duplicates_2.cal from Easy Calculate 5.0 (http://www.ian-ko.com/free/free_arcgis.htm).  Now at 10, I need to do it in python and I'm having little success.

example of desired result:
a==>1
b==>1
c==>1
b==>2
b==>3
a==>2


Here is a snippet of code that increments all the duplicate in the table:


    rows = arcpy.UpdateCursor("Junk1.dbf")
    fields = arcpy.ListFields("Junk1.dbf" )
    
    #Create an empty list
    myList = []
       
    i = 0
    for row in rows:
        for field in fields:            
            if field.name == 'ID':
                value = row.getValue(field.name)                
                if value in myList:       
                    i += 1
                    row.TEST = i
                    rows.updateRow(row)
                if value not in myList:
                    myList.append(value)


Any help would be greatly appreciated!!
Tags (2)
0 Kudos
10 Replies
TerrySilveus
Occasional Contributor III
I'm just learning python, but I think you might try something like this.. you'll probably have to fix syntax, but it's your list that may be the problem you are having... it won't cut it because you are incrementing regardless of what the "value" is so your results probably look more like this than what you want...
a==>1
b==>1
c==>1
b==>2
b==>3
a==>4

try a matrix instead of a one D list.  where you are storing [['a',1], ['b',1], ['c',1]] and incrementing the second value if it is found (to save the value) and then setting the field in that row to that value... - this is just air code so you'll have to test it out

import arcpy

rows = arcpy.UpdateCursor("Filename")
mylistarray = []


found = False
for row in rows:
    value = row.getValue("ID")
    for x in mylistarray:
        if x[0] == value:
            found = True
            x[1] +=1
            row.TEST = x[1]
     if not found:
        mylistarray.append([value,1])
        row.TEST = 1
    else:
        found = False
    rows.updateRow(row)    
        
0 Kudos
DarrenWiens2
MVP Honored Contributor
I usually use the approach of sorting by values, then comparing the previous value to the current value:

 rows = arcpy.UpdateCursor("Junk1.dbf","","","","ID A") # sort by ID, ascending order
    
    #Create an empty list
    myList = []
      
    i = 0
    for row in rows:
        if i = 0: #first time around
            value = row.ID
            myList.append(value)
        i += 1
        if row.ID != value: #if a new ID
            value = row.ID
            myList.append(value)
            i = 1
        row.TEST = i
        rows.updateRow(row)
   
0 Kudos
TerrySilveus
Occasional Contributor III
I usually use the approach of sorting by values, then comparing the previous value to the current value:

 rows = arcpy.UpdateCursor("Junk1.dbf","","","","ID A") # sort by ID, ascending order
    
    #Create an empty list
    myList = []
      
    i = 0
    for row in rows:
        if i = 0: #first time around
            value = row.ID
            myList.append(value)
        i += 1
        if row.ID != value: #if a new ID
            value = row.ID
            myList.append(value)
            i = 1
        row.TEST = i
        rows.updateRow(row)
   


Yeah:)... that's probably how I would have approached it as well, if I knew how to sort, being new to python though put me at a disadvantage.
0 Kudos
DavidBrett
New Contributor III
Thanks guys!! I was able to get these both to work with a little tweaking. 

I've settled on this version for now:
   rows = arcpy.UpdateCursor("Junk1.dbf","","","","ID A") # sort by ID, ascending order
    #Create an empty list
    myList = []

    i = 0
    for row in rows:
        if i == 0: #first time around
            value = row.ID
            myList.append(value)
        i += 1
        print value, i
        if row.ID != value: #if a new ID
            value = row.ID
            myList.append(value)
            i = 1
        row.TEST = i - 1
        rows.updateRow(row)


any suggestions on getting that row.TEST = i - 1 to actually start on 0 for the first instance of a dup and then start counting from 0, rather than subtracting 1?
0 Kudos
TerrySilveus
Occasional Contributor III
start i at -1

   rows = arcpy.UpdateCursor("Junk1.dbf","","","","ID A") # sort by ID, ascending order
    #Create an empty list
    myList = []

    i = -1
    for row in rows:
        if i == -1: #first time around
            value = row.ID
            myList.append(value)
        i += 1
        print value, i
        if row.ID != value: #if a new ID
            value = row.ID
            myList.append(value)
            i = 0
        row.TEST = i 
        rows.updateRow(row)


NOTE: unless you use myList elsewhere in your code, doing it this way has eliminated the need for myList
0 Kudos
DavidBrett
New Contributor III
Great! Not really sure why I was using an array to be honest.  It seemed like that's what the examples were showing so I just went with it.

Everything seems to working.  Next step is to turn it into a tool...
0 Kudos
DavidBrett
New Contributor III
any ideas on getting this run in the field calculator?
0 Kudos
RichardFairhurst
MVP Honored Contributor
any ideas on getting this run in the field calculator?


It won't work with the Field Calculator.  The Field Calculator will not let you control the record order by sorting it, which this code needs to do.
0 Kudos
BobbiLay
New Contributor III
This is has been extremely helpful.  Quick question as I am new to arcpy.  If I want to mark ALL repeated IDs with a flag and not necessarily just the second, third, fourth occurrence, how might I accomplish this?  For instance, if I had the following list of values:

a
b
c
d
e
e

With the above scripts, I would get

a-0
b-0
c-0
d-0
e-0
e-1

What I want is

a-0
b-0
c-0
d-0
e-1
e-1 or 2 (Doesn't matter as long as I can see that this value is not equal to zero).

My goal is to flag the values that are duplicates, identify the value and then assign a new value to those records.  If I split a polygon, for instance, I don't want to maintain the ID of the old one, I want to seek out the value and replace it with two new unique IDs.

Thanks in advance.
0 Kudos