Need help improving an UpdateCursor script

DominicRobergeIADOT · ‎01-24-2023

Hello,

I have to update a feature class (16K records) based on four other feature classes. I am using the following code, which is working, but quite slow (about 35 min to complete). So, I was wondering if you guys could suggest some improvement to speed this up.

Thanks!

    fieldsUpdate = ['OBJECTID','dr_Milepost','DR_distance_MP','Dr_Route','DR_DistanceFromRoute','DR_LANDOWNER_NAME','DR_PARCEL_NUMBER','DR_CountyName','DR_StateName']
    fieldsM = ['OBJECTID', 'mile_post']
    fieldsRoute = ['OBJECTID', 'id']
    fieldsParcels = ['FID_Structures_NAD83_Update', 'LANDOWNER_NAME_1', 'PARCEL_NUMBER_1']
    fieldsCounty = ['FID_Structures_NAD83_Update', 'CountyName_1', 'stateAbbr']

    with arcpy.da.UpdateCursor(flStructure, fieldsUpdate) as cursorU:
        for rowU in cursorU:

            #Section to update the MilePost
            strMatch = "OBJECTID =" + str(rowU[1])
            #print(strMatch)
            print("Mile post")
            with arcpy.da.SearchCursor(flMilePost, fieldsM,where_clause=strMatch) as cursorM:
                for rowM in cursorM:
                    print(u'{0}'.format(rowM[1]))
                    rowU[1] = rowM[1]
                    rowU[2] = rowU[2] * 3.28084
            # Section to update the Route
            print("Route")
            strMatchRoute = "OBJECTID =" + str(rowU[3])
            with arcpy.da.SearchCursor(flRouteLYR, fieldsRoute,where_clause=strMatchRoute) as cursorR:
                for rowR in cursorR:
                    #print(u'{0}'.format(rowR[0]))
                    rowU[3] = rowR[1]
                    rowU[4] = rowU[4] * 3.28084
            ## Update using the intersect Parcels data
            print("Parcels")
            strMatchP = "FID_Structures_NAD83_Update ="+str(rowU[0])
            with arcpy.da.SearchCursor(flParcels+"int", fieldsParcels,where_clause=strMatchP) as cursorP:
                for rowP in cursorP:
                    rowU[5] = rowP[1]
                    rowU[6] = rowP[2]
            ## Update using the intersect County data
            print("County")
            strMatchC = "FID_Structures_NAD83_Update =" + str(rowU[0])
            with arcpy.da.SearchCursor(flCounty+"int", fieldsCounty,where_clause=strMatchC) as cursorC:
                for rowC in cursorC:
                    rowU[7] = rowC[1]
                    if rowC[2] == 'SD':
                        rowU[8] = 'South Dakota'
                    elif rowC[2]== 'ND':
                        rowU[8] = 'North Dakota'
                    elif rowC[2]== 'NE':
                        rowU[8] = 'Nebraska'
                    elif rowC[2]== 'MN':
                        rowU[8] = 'Minnesota'
                    elif rowC[2]== 'IA':
                        rowU[8] = 'Iowa'
                    else:
                        rowU[8] = ''
            cursorU.updateRow(rowU)

DominicRobergeIADOT · ‎01-25-2023

Hey Jeff

Your solution saved me a bunch if time, thank you. However, I have an issue with only ONE dictionary section (the Route data). The code returns None all the records, but the dictionary has the value related to the specific record

ONLY THE ROUTE section return None for every records

value returned by Dictionary

({10299: {'OBJECTID': 10299, 'id': 'IAL-331'}, 10300: {'OBJECTID': 10300, 'id': 'NEL-309'}, 10301: {'OBJECTID': 10301, 'id': 'NEL-312B'}, 10302: {'OBJECTID': 10302, 'id': 'NEL-314'}, 10303: {'OBJECTID': 10303, 'id': 'NEL-313'}, 10304: {'OBJECTID': 10304, 'id': 'SDT-208'}, 10305: {'OBJECTID': 10305, 'id': 'SDT-207'}, 10306: {'OBJECTID': 10306, 'id': 'SDL-336'}, 10307: {'OBJECTID': 10307, 'id': 'SDT-209'}, 10308: {'OBJECTID': 10308, 'id': 'SDL-335'}, 10309: {'OBJECTID': 10309, 'id': 'SDT-210'}, 10310: {'OBJECTID': 10310, 'id': 'NDT-211'}, 10311: {'OBJECTID': 10311, 'id': 'MNL-321A'}, 10312: {'OBJECTID': 10312, 'id': 'IAT-205'}, 10313: {'OBJECTID': 10313, 'id': 'SDL-320'}, 10314: {'OBJECTID': 10314, 'id': 'SDT-206'}, 10315: {'OBJECTID': 10315, 'id': 'NEL-333'}, 10316: {'OBJECTID': 10316, 'id': 'NEL-310'}, 10317: {'OBJECTID': 10317, 'id': 'NEL-312A'}, 10318: {'OBJECTID': 10318, 'id': 'NDL-324'}, 10319: {'OBJECTID': 10319, 'id': 'NDL-325B'}, 10320: {'OBJECTID': 10320, 'id': 'NDL-327A'}, 10321: {'OBJECTID': 10321, 'id': 'NDL-323'}, 10322: {'OBJECTID': 10322, 'id': 'NDL-326'}, 10323: {'OBJECTID': 10323, 'id': 'MNL-321B'}, 10324: {'OBJECTID': 10324, 'id': 'NDL-325A'}, 10325: {'OBJECTID': 10325, 'id': 'NDL-327B'}, 10326: {'OBJECTID': 10326, 'id': 'NEL-316B'}, 10327: {'OBJECTID': 10327, 'id': 'NEL-316A'}, 10328: {'OBJECTID': 10328, 'id': 'NEL-315'}, 10329: {'OBJECTID': 10329, 'id': 'IAL-318A'}, 10330: {'OBJECTID': 10330, 'id': 'SDM-104B'}, 10331: {'OBJECTID': 10331, 'id': 'SDM-105B'}, 10332: {'OBJECTID': 10332, 'id': 'NDM-106'}, 10333: {'OBJECTID': 10333, 'id': 'SDM-105A'}, 10334: {'OBJECTID': 10334, 'id': 'NDL-328'}, 10335: {'OBJECTID': 10335, 'id': 'SDM-104A'}, 10336: {'OBJECTID': 10336, 'id': 'IAL-318B'}, 10337: {'OBJECTID': 10337, 'id': 'IAL-329'}, 10338: {'OBJECTID': 10338, 'id': 'IAT-201'}, 10339: {'OBJECTID': 10339, 'id': 'IAT-202'}, 10340: {'OBJECTID': 10340, 'id': 'IAL-306'}, 10341: {'OBJECTID': 10341, 'id': 'IAM-102'}, 10342: {'OBJECTID': 10342, 'id': 'IAM-103A'}, 10343: {'OBJECTID': 10343, 'id': 'IAT-203'}, 10344: {'OBJECTID': 10344, 'id': 'MNL-305'}, 10345: {'OBJECTID': 10345, 'id': 'MNL-303B'}, 10346: {'OBJECTID': 10346, 'id': 'IAL-308A'}, 10347: {'OBJECTID': 10347, 'id': 'IAT-204B'}, 10348: {'OBJECTID': 10348, 'id': 'IAM-101B'}, 10349: {'OBJECTID': 10349, 'id': 'IAL-301B'}, 10350: {'OBJECTID': 10350, 'id': 'IAM-103B'}, 10351: {'OBJECTID': 10351, 'id': 'MNL-303A'}, 10352: {'OBJECTID': 10352, 'id': 'IAL-301A'}, 10353: {'OBJECTID': 10353, 'id': 'LITTLE SIOUX CAPFAC'}, 10354: {'OBJECTID': 10354, 'id': 'IAL-340'}, 10355: {'OBJECTID': 10355, 'id': 'IAM-101A1'}, 10356: {'OBJECTID': 10356, 'id': 'IAM-101A2'}, 10357: {'OBJECTID': 10357, 'id': 'IAL-302B'}, 10358: {'OBJECTID': 10358, 'id': 'IAL-308B'}, 10359: {'OBJECTID': 10359, 'id': 'MNL-304'}, 10360: {'OBJECTID': 10360, 'id': 'IAT-204A'}, 10361: {'OBJECTID': 10361, 'id': 'IAL-302A'}, 10362: {'OBJECTID': 10362, 'id': 'MNL-337'}})

this is what that UpdateCursor returns:

before 30833, 58150, 617.5505629486523, 10346, 610.630249122789, 0, 0, 0, 0
MP dict value: {'OBJECTID': 58150, 'mile_post': 73.6}
Route dict value: None
Parcels dict value: None
County dict value: {'FID_Structures_NAD83_Update': 30833, 'CountyName_1': 'Crawford', 'stateAbbr': 'IA'}
after 30833, 73.6, 2026.0845889444565, 10346, 610.630249122789, 0, 0, Crawford, Iowa

    with arcpy.da.UpdateCursor(flStructure, fieldsUpdate) as cursorU:
        for rowU in cursorU:
            print(u'before {0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}'.format(rowU[0],rowU[1],rowU[2],rowU[3],rowU[4],rowU[5],rowU[6],rowU[7],rowU[8]))
            strMatch = milepostDct.get(rowU[1],None)
            print("MP dict value: " + str(strMatch))
            if strMatch:
                rowU[1] = strMatch['mile_post']
                rowU[2] = rowU[2] * 3.28084
            strRoute1 = rteDct.get(rowU[3], None)
            print("Route dict value: " +str(strRoute1))
            if strRoute1:
                rowU[3] = strRoute1['id']
                rowU[4] = rowU[4] * 3.28084
            strParcel = parcelDct.get(rowU[0], None)
            print("Parcels dict value: " +str(strParcel))
            if strParcel:
                rowU[5] = strParcel['LANDOWNER_NAME_1']
                rowU[6] = strParcel['PARCEL_NUMBER_1']

            strCounty = cntyDct.get(rowU[0], None)
            print("County dict value: " +str(strCounty))
            if strCounty:
                rowU[7] = strCounty['CountyName_1']
                rowU[8] = stateDict.get(strCounty['stateAbbr'])

            print(u'after {0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}'.format(rowU[0], rowU[1], rowU[2], rowU[3],rowU[4], rowU[5], rowU[6], rowU[7],rowU[8]))
            #cursorU.updateRow(rowU)

One I figure this issue, the script will run within 2 minutes

Thanks!

View solution in original post

mody_buchbinder · ‎01-24-2023

Hi Dominic

You are running four search cursors for each record.

The most important thing is to understand where the problem is.

Try to run the program four times each run when only one search cursor is used.

Two suggestions:

1) Index FID_Structures_NAD83_Update

2) If the four layers are not very big you can search them once and put all records into a python dictionary, then instead of search cursor you just gets the values from the dictionary.

Have Fun

dslamb2022 · ‎01-25-2023

Instead of nesting search cursors within the update. I would run those separately before the update and store the values as a dictionary. Since you are looking up by Object ID, use that as your key, and the value as the number you are looking for. For example:

with arcpy.da.SearchCursor(flMilePost, fieldsM,where_clause=strMatch) as cursorM:
    milesDict = {}
    for rowM in cursorM:
        #print(u'{0}'.format(rowM[1]))
        milesDict[rowM[0]] = rowM[1]

And in the update:

with arcpy.da.UpdateCursor(flStructure, fieldsUpdate) as cursorU:

            strMatch = "OBJECTID =" + str(rowU[1]) 
            #print(strMatch)
             rowU[1] = milesDict[rowU[0]]
             rowU[2] = rowU[2] * 3.28084

It's not clear why you have Mile Post set to the OBJECTID in "OBJECTID =" + str(rowU[1]). Should the index be 0?

Anonymous User · ‎01-25-2023

I second the dictionary approach. Do that on all four of the 'source' featureclasses first- outside of the UpdateCursor- something like this:

# Create a dictionary of fields and values, using the objectid as key
mCur = arcpy.da.SearchCursor(flMilePost, fieldsM)
milepostDct = {row[fieldsM.index('OBJECTID')]: {k: v for k, v in zip(mCur.fields, row)} for row in mCur}

rCur = arcpy.da.SearchCursor(flRouteLYR, fieldsRoute)
routeDct = {row[fieldsRoute.index('OBJECTID')]: {k: v for k, v in zip(rCur.fields, row)} for row in rCur}

pCur = arcpy.da.SearchCursor(flParcels+"int", fieldsParcels)
parcelDct = {row[fieldsParcels.index('OBJECTID')]: {k: v for k, v in zip(pCur.fields, row)} for row in pCur}

cCur = arcpy.da.SearchCursor(flCounty + "int", fieldsCounty)
cntyDct = {row[fieldsCounty.index('OBJECTID')]: {k: v for k, v in zip(cCur.fields, row)} for row in cCur}

del mCur, rCur, pCur, cCur

stateDict = {'SD': 'South Dakota', 'ND': 'North Dakota', 'NE': 'Nebraska', 'MN': 'Minnesota', 'IA': 'Iowa'}

with arcpy.da.UpdateCursor(flStructure, fieldsUpdate) as uCursor:
    for rowU in uCursor:
        strMatch = milepostDct.get(rowU[1], None) #<- don't really need None here since .get() will return a default None, but you can replace that wih another value if you want a default return val.
        if strMatch:
            rowU[1] = strMatch['mile_post']
            rowU[2] = rowU[2] * 3.28084

        strRoute = routeDct.get(rowU[3], None)
        if strRoute:
            rowU[3] = strRoute['id']
            rowU[4] = rowU[4] * 3.28084

        strParcel = parcelDct.get(rowU[0], None)
        if strParcel:
            rowU[5] = strParcel['LANDOWNER_NAME_1']
            rowU[6] = strParcel['PARCEL_NUMBER_1']

        strCounty = cntyDct.get(rowU[0], None)
        if strCounty:
            rowU[7] = strCounty['CountyName_1']
            rowU[8] = stateDict.get(strCounty['stateAbbr'])
        uCursor.updateRow(rowU)

Then you can iterate over each dictionary within the UpdateCursor:

DominicRobergeIADOT · ‎01-25-2023

Hey Jeff

Your solution saved me a bunch if time, thank you. However, I have an issue with only ONE dictionary section (the Route data). The code returns None all the records, but the dictionary has the value related to the specific record

ONLY THE ROUTE section return None for every records

value returned by Dictionary

({10299: {'OBJECTID': 10299, 'id': 'IAL-331'}, 10300: {'OBJECTID': 10300, 'id': 'NEL-309'}, 10301: {'OBJECTID': 10301, 'id': 'NEL-312B'}, 10302: {'OBJECTID': 10302, 'id': 'NEL-314'}, 10303: {'OBJECTID': 10303, 'id': 'NEL-313'}, 10304: {'OBJECTID': 10304, 'id': 'SDT-208'}, 10305: {'OBJECTID': 10305, 'id': 'SDT-207'}, 10306: {'OBJECTID': 10306, 'id': 'SDL-336'}, 10307: {'OBJECTID': 10307, 'id': 'SDT-209'}, 10308: {'OBJECTID': 10308, 'id': 'SDL-335'}, 10309: {'OBJECTID': 10309, 'id': 'SDT-210'}, 10310: {'OBJECTID': 10310, 'id': 'NDT-211'}, 10311: {'OBJECTID': 10311, 'id': 'MNL-321A'}, 10312: {'OBJECTID': 10312, 'id': 'IAT-205'}, 10313: {'OBJECTID': 10313, 'id': 'SDL-320'}, 10314: {'OBJECTID': 10314, 'id': 'SDT-206'}, 10315: {'OBJECTID': 10315, 'id': 'NEL-333'}, 10316: {'OBJECTID': 10316, 'id': 'NEL-310'}, 10317: {'OBJECTID': 10317, 'id': 'NEL-312A'}, 10318: {'OBJECTID': 10318, 'id': 'NDL-324'}, 10319: {'OBJECTID': 10319, 'id': 'NDL-325B'}, 10320: {'OBJECTID': 10320, 'id': 'NDL-327A'}, 10321: {'OBJECTID': 10321, 'id': 'NDL-323'}, 10322: {'OBJECTID': 10322, 'id': 'NDL-326'}, 10323: {'OBJECTID': 10323, 'id': 'MNL-321B'}, 10324: {'OBJECTID': 10324, 'id': 'NDL-325A'}, 10325: {'OBJECTID': 10325, 'id': 'NDL-327B'}, 10326: {'OBJECTID': 10326, 'id': 'NEL-316B'}, 10327: {'OBJECTID': 10327, 'id': 'NEL-316A'}, 10328: {'OBJECTID': 10328, 'id': 'NEL-315'}, 10329: {'OBJECTID': 10329, 'id': 'IAL-318A'}, 10330: {'OBJECTID': 10330, 'id': 'SDM-104B'}, 10331: {'OBJECTID': 10331, 'id': 'SDM-105B'}, 10332: {'OBJECTID': 10332, 'id': 'NDM-106'}, 10333: {'OBJECTID': 10333, 'id': 'SDM-105A'}, 10334: {'OBJECTID': 10334, 'id': 'NDL-328'}, 10335: {'OBJECTID': 10335, 'id': 'SDM-104A'}, 10336: {'OBJECTID': 10336, 'id': 'IAL-318B'}, 10337: {'OBJECTID': 10337, 'id': 'IAL-329'}, 10338: {'OBJECTID': 10338, 'id': 'IAT-201'}, 10339: {'OBJECTID': 10339, 'id': 'IAT-202'}, 10340: {'OBJECTID': 10340, 'id': 'IAL-306'}, 10341: {'OBJECTID': 10341, 'id': 'IAM-102'}, 10342: {'OBJECTID': 10342, 'id': 'IAM-103A'}, 10343: {'OBJECTID': 10343, 'id': 'IAT-203'}, 10344: {'OBJECTID': 10344, 'id': 'MNL-305'}, 10345: {'OBJECTID': 10345, 'id': 'MNL-303B'}, 10346: {'OBJECTID': 10346, 'id': 'IAL-308A'}, 10347: {'OBJECTID': 10347, 'id': 'IAT-204B'}, 10348: {'OBJECTID': 10348, 'id': 'IAM-101B'}, 10349: {'OBJECTID': 10349, 'id': 'IAL-301B'}, 10350: {'OBJECTID': 10350, 'id': 'IAM-103B'}, 10351: {'OBJECTID': 10351, 'id': 'MNL-303A'}, 10352: {'OBJECTID': 10352, 'id': 'IAL-301A'}, 10353: {'OBJECTID': 10353, 'id': 'LITTLE SIOUX CAPFAC'}, 10354: {'OBJECTID': 10354, 'id': 'IAL-340'}, 10355: {'OBJECTID': 10355, 'id': 'IAM-101A1'}, 10356: {'OBJECTID': 10356, 'id': 'IAM-101A2'}, 10357: {'OBJECTID': 10357, 'id': 'IAL-302B'}, 10358: {'OBJECTID': 10358, 'id': 'IAL-308B'}, 10359: {'OBJECTID': 10359, 'id': 'MNL-304'}, 10360: {'OBJECTID': 10360, 'id': 'IAT-204A'}, 10361: {'OBJECTID': 10361, 'id': 'IAL-302A'}, 10362: {'OBJECTID': 10362, 'id': 'MNL-337'}})

this is what that UpdateCursor returns:

before 30833, 58150, 617.5505629486523, 10346, 610.630249122789, 0, 0, 0, 0
MP dict value: {'OBJECTID': 58150, 'mile_post': 73.6}
Route dict value: None
Parcels dict value: None
County dict value: {'FID_Structures_NAD83_Update': 30833, 'CountyName_1': 'Crawford', 'stateAbbr': 'IA'}
after 30833, 73.6, 2026.0845889444565, 10346, 610.630249122789, 0, 0, Crawford, Iowa

    with arcpy.da.UpdateCursor(flStructure, fieldsUpdate) as cursorU:
        for rowU in cursorU:
            print(u'before {0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}'.format(rowU[0],rowU[1],rowU[2],rowU[3],rowU[4],rowU[5],rowU[6],rowU[7],rowU[8]))
            strMatch = milepostDct.get(rowU[1],None)
            print("MP dict value: " + str(strMatch))
            if strMatch:
                rowU[1] = strMatch['mile_post']
                rowU[2] = rowU[2] * 3.28084
            strRoute1 = rteDct.get(rowU[3], None)
            print("Route dict value: " +str(strRoute1))
            if strRoute1:
                rowU[3] = strRoute1['id']
                rowU[4] = rowU[4] * 3.28084
            strParcel = parcelDct.get(rowU[0], None)
            print("Parcels dict value: " +str(strParcel))
            if strParcel:
                rowU[5] = strParcel['LANDOWNER_NAME_1']
                rowU[6] = strParcel['PARCEL_NUMBER_1']

            strCounty = cntyDct.get(rowU[0], None)
            print("County dict value: " +str(strCounty))
            if strCounty:
                rowU[7] = strCounty['CountyName_1']
                rowU[8] = stateDict.get(strCounty['stateAbbr'])

            print(u'after {0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}'.format(rowU[0], rowU[1], rowU[2], rowU[3],rowU[4], rowU[5], rowU[6], rowU[7],rowU[8]))
            #cursorU.updateRow(rowU)

One I figure this issue, the script will run within 2 minutes

Thanks!

Anonymous User · ‎01-25-2023

That's a good improvement! If you still want to save some time, you can use multiprocessing and create all 4 dicts simultaneously...

I'd start with checking the data type of rowU[3]. It looks like the keys in that dict are ints, so if rowU[3] is a string, there won't be a key match. You can cast it to int() if its Text, or cast the key to a str when you make the dictionary.

PS, u can use the f decorator in your prints and make it a little more concise and readable:

print(f'after: {rowU[0]}, {rowU[1]}, {rowU[2]}, {rowU[3]}, {rowU[4]}, {rowU[5]}, {rowU[6]}, {rowU[7]}, {rowU[8]}')

DominicRobergeIADOT · ‎01-25-2023

that was it:

I add a cast to int and everything started working as expected. Thank you so much for all your help

D

DominicRobergeIADOT · ‎01-25-2023

Thank you all for the information and quick response. I will try the dictionary approach and let you know how it goes.

just a little bit of background on that project, I am running the Near geoprocessing on my Route and Mile Markers (and the RouteID is updated with the OBJECTID of the route, that's why I have to go back and update that RouteID with the RouteName... same with my Mile Markers)

Then I do two intersects (structures with Parcels and Structures with Counties)... that's why I have to go back and update the structure fields I need from the parcels and counties data.

Thanks!

RhettZufelt · ‎01-25-2023

Another thing that can really slow it down, though have not tested in Python 3.x, but in 2.7, the more print statements, the slower it went.

I set a variable at the first:

ShowPrint = True

then all my print statements are:

if ShowPrint:print("print value")

I could run scripts taking 15+ minutes to finish, set debug = False and the script would fininsh in under two minutes, so can make a huge difference.

R_