CalculateFields (multiple) vs. da.UpdateCursor

JakeJohnson · ‎06-08-2020

Hello,

I've been trying to improve the scripts I have and stumbled across this legendary post:

/blogs/richard_fairhurst/2014/11/08/turbo-charging-data-manipulation-with-python-cursors-and-diction...

However, I was wondering if CalculateFields as opposed to CalculateField would be even better. I don't know how it works behind the scenes -- is it just multiple CalculateField calls or more than that?

JoeBorgione · ‎06-08-2020

Calculate fields is a new one for me, but after looking at the help docs it appears to be as you suggest, it appears to be calculate field but with mulitple fields and code blocks. In other words, each record will have the same value in the given field(s).

I have been using Richard Fairhurst‌'s approach for a couple of years; it's a mainstay for me.

That should just about do it....

JoshuaBixby · ‎06-08-2020

I don't know how it works behind the scenes

Outside of Esri's development team, no one know exactly how it works.

Whether Calculate Field—Data Management toolbox | Documentation or Calculate Fields (multiple)—Data Management toolbox | Documentation, a cursor in some way, shape, or form is being used to iterate over and update the data since cursors are a fundamental database/datastore control structure.

My guess is that Esri introduced the multiple version of Calculate Field for people who have a need for basic updates to multiple fields but cursors are more in the weeds than they want to go. Also, since the multiple version is a Geoprocessing tool, it will integrate into Model Builder and streaming models that involved updating several fields.

I am sort of surprised it took this long, but then again I am not surprised at all....

JakeJohnson · ‎06-08-2020

That makes it difficult to judge which one is more efficient for those willing to venture outside of the provide GP tools. I could always watch it run and compare but it would be quicker if I knew I'm not just replicating what they already have lol.

JoshuaBixby · ‎06-08-2020

If you are calculating on a single field, I would expect them to perform quite close in terms of performance.

JakeJohnson · ‎06-08-2020

I have multiple instances in the script that does 4 fields at once, sadly.

JoshuaBixby · ‎06-08-2020

And why can't you use the new tool for that situation? I only mention a single field because the original tool can only do a single field, so you can only compare the two tools using a single field. In cases with multiple fields, running the new tool once against 4 fields will definitely be faster than running the old tool 4 times.

JakeJohnson · ‎06-09-2020

I am, actually! (Also, correction 3 fields not 4.)

This is the code I have had for months.

arcpy.CalculateFields_management("asset_temp", "PYTHON3", [["SPATAILSTART", spatial_start],
 ["SPATAILEND", spatial_end],
 ["SPATIALID", spatial_id_line_sewer]])‍‍‍

It works perfectly fine but I was wondering if I could make my script faster. It interacts with probably 100k+ features multiple times. I saw that cursor post and wanted to experiment.

field_list = ["SPATIALSTART", "SPATIALEND", "SPATIALID"]

with arcpy.da.UpdateCursor("asset_temp", field_list) as update_cursor:
 for row in update_cursor:
 row[1] = spatial_start
 row[2] = spatial_end
 row[3] = spatial_id_line_sewer
 update_cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I wanted to see if this was faster.

JoeBorgione · ‎06-09-2020

Not sure if your code is meant to emulate the turbo approach, but I think a straight calculate fields is going to be faster than an update cursor plowing through a 100K + records. Below is an example of how I've used the Fairhurst procedure to fix a mistake I made: in the target fc, I had incorrectly updated the UNINCCOM_L field, so I went back to the source data, created a dictionary of the original values and using the JOINID field as the key, updated my target feature class, correcting my earlier error. Much faster than a join and subsequent calculate field...

import arcpy
targetFC = r'C:\GIS\AddressDataManagement\Address Data Management\MSDData.gdb\MSD_Centerlines'
targetFields = ['JOINID', 'UNINCCOM_L']

sourceFC =r'C:\GIS\AddressDataManagement\Address Data Management\MSDData.gdb\AllCenterlines'
sourceFields = ['JOINID', 'UNINCCOM_L']
sourceDict ={r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC,sourceFields)}

with arcpy.da.UpdateCursor(targetFC,targetFields) as updateRows:
    for updateRow in updateRows:
       keyValue = updateRow[0]
       if keyValue in sourceDict:
           updateRow[1] = sourceDict[keyValue][0]
           updateRows.updateRow(updateRow)
       else:
            pass
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

That should just about do it....

BlakeTerhune · ‎06-10-2020

Joe Borgione wrote:
...the Fairhurst procedure...

The first thing I thought of was the Riker Maneuver.