Select to view content in your preferred language

How to check for duplicates in two fields of multiple rows then change a value

1132
2
06-23-2020 01:45 AM
MatthewHowe1
New Contributor

I have a feature class with many rows representing wells. Each well record has two particular fields that I am interested in; FIELD and TOT_REC_RES_MMBOE. In some cases the value of field name and volume are the same e.g. Record 1: Well A, 500, Record 2: Well A, 500. I would like to look for duplicates of these two fields then set the volume field of only the duplicates to 0. If Well A, 500 was unique the volume would stay at 500. In summary I need to set the volume field for each duplicate found to 0 but keep the first record as the current volume, 500 in the example. In the code below, I'm using an update cursor to get the volume values from a fields feature class so wondered if I could do the duplicate section here.

src_field_list = ["FIELD", "IN_OIL_EQUIV_MMB"]
value_dict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(field_lyr, src_field_list)}
tar_field_list = ["FIELD", "TOT_REC_RES_MMBOE", "DISCOVERY_WELL_FLG"]
with arcpy.da.UpdateCursor(wells_temp, tar_field_list) as cursor:
 for row in cursor:
  key_val = row[0]
  if key_val in value_dict:
   row[1] = value_dict[key_val][0]
   cursor.updateRow(row)
  if row[1] is None: # Set the volume null values to 0 so they're not skipped in the well orders
   row[1] = 0
   cursor.updateRow(row)
  if row[2] is None:
   row[1] = 0
   cursor.updateRow(row)
del value_dict
del cursor‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
2 Replies
DanPatterson
MVP Esteemed Contributor

It would be easier if you have a lot of duplicates or your duplicates aren't sequential to process that data separately.

If you have the appropriate license level, this would provide the necessary information

Find Identical—Data Management toolbox | Documentation 


... sort of retired...
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

At a minimum, you will want to sort the records by ObjectID before processing them and updating values for the duplicates; otherwise, you run the risk of having inconsistent results if you happen to run the code on a copy of the original data set in the future.

This type of question is asked quite frequently on GeoNet, although how people are grouping records and what they are doing with the groups varies.  I haven't tested the following code, but I adapted it from one of my replies to a similar question on GeoNet:  Search Cursor to sort, group data with itertools, then sort on groups in list on lowest OBID 

import arcpy
from itertools import groupby
from operator import itemgetter

fc = # path to feature class or table
case_fields = ["FIELD", "TOT_REC_RES_MMBOE"]
sort_field, sort_order = "OBJECTID", "ASC"

fields = case_fields 
sql_orderby = "ORDER BY {}, {} {}".format(
    ", ".join(case_fields), sort_field, sort_order
)

with arcpy.da.UpdateCursor(fc, fields, sql_clause=(None, sql_orderby)) as cur:
    case_func = itemgetter(*range(len(case_fields)))
    for key, group in groupby(cur, case_func):
        next(group)
        for dup in group:
            dup[-1] = 0
            cur.updateRow(dup)
            
del cur
0 Kudos