I would like to only populate one of the duplicate record but with number of duplicates like below.
How can do this or what is the best way to achieve this with python?
well_id Dup
D36528 3
D36528
D36528
D36532 2
D36532
D36521
D36522
D36525 2
D36525
Solved! Go to Solution.
You could use a search cursor to read in the ids and then an update cursor the write the count out the feature class.
import arcpy
fc_path = r'C:\my_temp\junk3\geo\fgdb.gdb\wells'
ids = []
with arcpy.da.SearchCursor(fc_path, 'ID') as cursor:
for row in cursor:
ids.append(row[0])
# make empty dict
ids_dict = {}
# get a list of unique ids, e.g. remove duplicates, set does this
# then just convert it back to a list
uids = list(set(ids))
# for each uid make a dict item with the key the uid and an empty list as the value
for uid in uids:
ids_dict[uid] = []
# this is what sets you up for counting. it finds the matching well id keys
# and adds the key to the list value so you get in the dict {'w1': [w1, w1, w1]}
# then all you need to do count the length of the value list.
for id in ids:
ids_dict[id] = ids_dict[id] + [id]
with arcpy.da.UpdateCursor(fc_path, ['ID', 'DUP']) as cursor:
for row in cursor:
for k, v in ids_dict.items():
if row[0] == k and v is not None:
row[1] = len(v)
ids_dict[k] = None
cursor.updateRow(row)
You could use dictionaries to count the number of repeating ids. Maybe something like this...
ids = ['D36528', 'D36528', 'D36528', 'D36532', 'D36532', 'D36521', 'D36522', 'D36525', 'D36525', ] ids_dict = {} uids = list(set(ids)) for uid in uids: ids_dict[uid] = [] for id in ids: ids_dict[id] = ids_dict[id] + [id] for k, v in ids_dict.items(): print '{k}, {n}'.format(k=k, n=len(v))
Output:
D36532, 2
D36528, 3
D36521, 1
D36522, 1
D36525, 2
How do reading the valuse from a dictionary use dict for with arcpy.da.UpdateCursor?
I am not sure I understand your question. Maybe it would be helpful if we had a little more info...
I would use an update cursor to write the info from the dictionary back to a feature class in a fgdb for sure.
Data is in a feature class in a fgdb.
I want to be able to see how many times an attribute is duplicated in a the field well_Id but I don't need all of them to have a number in the Dup filed but if that is the only way then that is ok. How do I take k & n and write to the feature class thought update cursor?
You could use a search cursor to read in the ids and then an update cursor the write the count out the feature class.
import arcpy
fc_path = r'C:\my_temp\junk3\geo\fgdb.gdb\wells'
ids = []
with arcpy.da.SearchCursor(fc_path, 'ID') as cursor:
for row in cursor:
ids.append(row[0])
# make empty dict
ids_dict = {}
# get a list of unique ids, e.g. remove duplicates, set does this
# then just convert it back to a list
uids = list(set(ids))
# for each uid make a dict item with the key the uid and an empty list as the value
for uid in uids:
ids_dict[uid] = []
# this is what sets you up for counting. it finds the matching well id keys
# and adds the key to the list value so you get in the dict {'w1': [w1, w1, w1]}
# then all you need to do count the length of the value list.
for id in ids:
ids_dict[id] = ids_dict[id] + [id]
with arcpy.da.UpdateCursor(fc_path, ['ID', 'DUP']) as cursor:
for row in cursor:
for k, v in ids_dict.items():
if row[0] == k and v is not None:
row[1] = len(v)
ids_dict[k] = None
cursor.updateRow(row)
Thank for sharing.
Can you explain what lines 11-19 are doing please?
I added comments to the code above. I would get it into a python IDE and step through it with a debugger. Doing this when I am taking on something new speeds up my learning... it takes away the "back box" effect.
Or for something completely different... out to *.dbf or *.csv
import numpy as np
from arcpy.da import NumPyArrayToTable
#
ids = ['D36528', 'D36528', 'D36528','D36532', 'D36532',
'D36521', 'D36522','D36525', 'D36525', ]
# ---- numpy magic
u, c = np.unique(ids, return_counts=True)
dt = np.dtype([("Class", "<U7"), ("Count", "i4")])
uc = np.asarray(list(zip(u, c)), dtype=dt)
# ---- output to dbf and csv
NumPyArrayToTable(uc, "c:/temp/out.dbf")
np.savetxt("c:/temp/out.csv", uniq_counts, fmt="%7s, %3s",
comments="", header="Class, Counts")
The csv as an example
Class, Counts
D36521, 1
D36522, 1
D36525, 2
D36528, 3
D36532, 2
You can add the *.csv or *.dbf to arcmap or Pro, or join to an existing table
In terms of semantics, you are calling the field you want to populate "Dup", but you are putting the total count of records in that field unless there is only 1 record. As Dan and Forest have already implied, it is probably better to go with a field name like "Count" and then put the total count of records for all records. If you want to know which records have a duplicate, triplicate, etc...; you simply query on Count being greater than 1.
Along the lines of Forest's answer, but using slightly more compact code without having to create two cursors:
from arcpy.da import UpdateCursor
from collections import Counter
fc = # path to feature class
with UpdateCursor(fc, ["well_id", "Dup"]) as cur:
cnt = Counter(row[0] for row in cur)
cur.reset()
for row in cur:
cur.updateRow([row[0], cnt.pop(row[0], None)])