Script overwrites Unique IDs. What am I missing?

JCGuarneri · ‎04-06-2021

I'm having an odd issue with a script that I use to update the "last inspected" field in a feature class based on the related inspection records. It uses a search cursor to populate a dictionary with the most recent inspection date (values) for a given unique ID (keys). Then, it uses an update cursor, uses the unique ID field to look up the inspection date in the dictionary, and updates the inspection date if applicable. What I've noticed happening is that the script will overwrite the unique ids in chunks so that there are sets of duplicates. I'd love to say it was working perfectly before, but there was an index error that I recently fixed. It was only after I fixed that error and the script was able to run that I noticed this problem. I do have several other scripts that use the same template on other FCs that don't have this issue. Fortunately, I have archiving enabled and I was able to restore the proper ids.

If I disable the line that overwrites the row and have it print out the existing Unique ID and what the script wants to write in place, I get output like this:

(u'STV033632', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033633', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033634', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033635', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033636', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033637', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033638', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033639', (u'STV033586', datetime.datetime(2021, 3, 24, 4, 0)))
(u'STV033641', (u'STV033640', datetime.datetime(2021, 1, 6, 5, 0)))
(u'STV033642', (u'STV033640', datetime.datetime(2021, 1, 6, 5, 0)))
(u'STV033643', (u'STV033640', datetime.datetime(2021, 1, 6, 5, 0)))
(u'STV033644', (u'STV033640', datetime.datetime(2021, 1, 6, 5, 0)))
(u'STV033645', (u'STV033640', datetime.datetime(2021, 1, 6, 5, 0)))
(u'STV033646', (u'STV033640', datetime.datetime(2021, 1, 6, 5, 0)))

Here's the meat of the script. I've excised the except statements and all of the edit session formalities for the sake of brevity. Is there something obvious I'm missing here?

#specify fields from target dataset to be updated
oldFields = ("WAM_ID","LastServiceDate")
#specify fields from source dataset
newFields = ("wam_id","date")
#create empty dictionary to store values
valDict = {}

#use search cursor to collect values from source dataset
newValues = arcpy.da.SearchCursor(gateInspections,newFields,"wam_id is not Null")

for row in newValues:
	try:
		key = row[0]
		if key not in valDict:
			valDict[key] = row[1:]
			
		elif key in valDict:
			if valDict[key][0] < row[1]:
				valDict[key] = row[1:]
			else:
				pass
		else:
			pass
	except:
		logFile.write('No date for valve {0}.\n'.format(row[0]))


fcRows = arcpy.da.UpdateCursor(inFC,oldFields, "WAM_ID is not Null")

try:
    for row in fcRows:
        try:
            if valDict.has_key(row[0]):
                Vals = valDict[row[0]]
                if Vals[0] is None:
                    operDate = row[1]
                else:
                    operDate = Vals[0]
                newRow = (row[0],operDate)
            if tuple(row) != newRow:
                    fcRows.updateRow(newRow)
            else:
                pass

JohannesLindner · ‎04-07-2021

Does this do what you want?

#specify fields from target dataset to be updated
oldFields = ("WAM_ID","LastServiceDate")
#specify fields from source dataset
newFields = ("wam_id","date")


#use search cursor to collect values from source dataset
newValues = arcpy.da.SearchCursor(gateInspections,newFields,"wam_id is not Null")

inspections = [row for row in newValues]
unique_wam_ids = set([i[0] for i in inspections])
valDict = {uid: [i[1] for i in inspections if i[0] == uid] for uid in unique_wam_ids}
# valDict contains a list of datetime.datetime objects for each wam_id
# valDict = {"wam_id_1": [datetime(), datetime(), datetime()],
#            "wam_id_2": [datetime(), datetime()],
#            "wam_id_5": []
#           }

fcRows = arcpy.da.UpdateCursor(inFC,oldFields, "WAM_ID is not Null")
try:
    for row in fcRows:
        wam_id = row[0]
        try:
            # find the inspection dates for the given wam_id and sort them
            inspection_dates = sorted(valDict[wam_id])
        except KeyError:
            # no inspection found -> skip rest of the for block
            logFile.write('No inspection for valve {0}.\n'.format(wam_id))
            continue
        # somehow, there isn't a valid date -> skip the rest of the for block
        if len(inspection_dates) == 0 or inspection_dates[-1] is None:
            continue
        # write the most current inspection
        last_inspection = inspection_dates[-1]  # we sorted above, [-1] is the most current date
        fcRows.updateRow([wam_id, last_inspection])
except:
    # your outer try-except block
    pass

Have a great day!
Johannes

View solution in original post

DougBrowning · ‎04-06-2021

Why are you updating the ID at all. Why not just the date?

oldFields = ("WAM_ID","LastServiceDate") could be just oldFields = ("LastServiceDate") right?

Then i would add a whereclause like ID=ID and newdate > olddate

I tend to not use update cursor and instead calculate field

SelectbyAttribute(ID)

CalculateField(newdate)

You could also prob add the date check to the selectby

SelectbyAttribute(ID=ID and newdate > currentdate)

The 3rd way depends on where you show people the value. You could use Arcade to always grab the date dynamically in a web map, pro, etc.

See

https://community.esri.com/t5/arcgis-survey123-questions/how-to-create-an-arcade-expression-to-show-...

https://community.esri.com/t5/arcgis-collector-questions/collector-use-related-layer-to-symbolise-fe...

There are more that I cannot find.

Hope that helps.

JCGuarneri · ‎04-06-2021

@DougBrowning unless I'm mistaken, I need to pull the WAM_ID field to match up the records between the FC and related records. I see what you're saying about the whereclause on the updateCursor, and creating a new cursor for each record in the dict. I believe that would incur a huge performance penalty over iterating through rows in a single cursor, as would using the field calculator. My current script format processes a similar sized table in ~30 seconds, including loading the arcpy libraries. If I recall correctly, Field Calculator took 5-10 min or more.

I like the idea of using Arcade or a join in a view layer. I know we had a good reason for updating the FC instead of just doing a dynamic display, but I'll have to revisit if that reason is still valid.

Thank you,

Jay

DavidPike · ‎04-06-2021

As Doug say's, I don't know why you're updating the entire tuple rather than just the date. There's probably no avoiding it but the index slicing makes it hard to read through (for me at least).

is the most recent date always going to be at index 0 of the list you're comparing with the new date? As I assume valDict[key] is the list of dates.

elif key in valDict:
  if valDict[key][0] < row[1]:

JohannesLindner · ‎04-07-2021

Does this do what you want?

#specify fields from target dataset to be updated
oldFields = ("WAM_ID","LastServiceDate")
#specify fields from source dataset
newFields = ("wam_id","date")


#use search cursor to collect values from source dataset
newValues = arcpy.da.SearchCursor(gateInspections,newFields,"wam_id is not Null")

inspections = [row for row in newValues]
unique_wam_ids = set([i[0] for i in inspections])
valDict = {uid: [i[1] for i in inspections if i[0] == uid] for uid in unique_wam_ids}
# valDict contains a list of datetime.datetime objects for each wam_id
# valDict = {"wam_id_1": [datetime(), datetime(), datetime()],
#            "wam_id_2": [datetime(), datetime()],
#            "wam_id_5": []
#           }

fcRows = arcpy.da.UpdateCursor(inFC,oldFields, "WAM_ID is not Null")
try:
    for row in fcRows:
        wam_id = row[0]
        try:
            # find the inspection dates for the given wam_id and sort them
            inspection_dates = sorted(valDict[wam_id])
        except KeyError:
            # no inspection found -> skip rest of the for block
            logFile.write('No inspection for valve {0}.\n'.format(wam_id))
            continue
        # somehow, there isn't a valid date -> skip the rest of the for block
        if len(inspection_dates) == 0 or inspection_dates[-1] is None:
            continue
        # write the most current inspection
        last_inspection = inspection_dates[-1]  # we sorted above, [-1] is the most current date
        fcRows.updateRow([wam_id, last_inspection])
except:
    # your outer try-except block
    pass

Have a great day!
Johannes

JCGuarneri · ‎04-07-2021

@JohannesLindner that does the trick! I like your sorting approach, and explicitly setting storing the WAM_ID is a smart move. Learn something new every day.

Thank you!