topic Re: Optimizing UpdateCursor Script in Python Questions

Optimizing UpdateCursor Script

AlexanderDyer — Thu, 04 Jan 2024 17:26:28 GMT

Hello everyone,

I'm building a script to calculate offset values for linearly referenced data based on how many times distance values occur in a dataset. This is the script I've come up with and it works how I'd like it to, though once applied to the 6,000+ piece dataset it takes hours to complete.

Any thoughts on how I might be able to speed it up?

#pt table pt_tbl = 'ArcPRO_MasterQuery_EXT_pt' with arcpy.da.SearchCursor(pt_tbl, "NUM_SEC") as cursor2: traillist = [] tns = set(cursor2) for thing in tns: for d in thing: traillist.append(d) traillistsortpt = sorted(traillist) print("Calculating Point Table Offsets...") traillistsortpt = ["11-1", "11-2"] i = 0 for num in traillistsortpt: arcpy.management.SelectLayerByAttribute(pt_tbl, "NEW_SELECTION", "NUM_SEC = '%s'" %(num)) oor = {} oo = {} with arcpy.da.UpdateCursor(pt_tbl, ["OID@", "DISTANCE", "DESCRIPTION_1", "DESCRIPTION_2", "DESCRIPTION_3", "Offset"]) as cursor1: for row in cursor1: distance_value = row[1] oo[distance_value] = oo.get(distance_value, 0) +1 if "RHS" in str(row[2]) or "RHS" in str(row[3]) or "RHS" in str(row[4]): oor[distance_value] = oor.get(distance_value, 0) + 1 orh = (oo[distance_value] - 1) * -1 row[5] = orh elif row[1] in oor: o = (oo[distance_value] - (oor[distance_value]+1)) row[5] = o else: o = (oo[distance_value] - 1) row[5] = o cursor1.updateRow(row) i+=1 print(i, " of ", len(traillistsortpt)) arcpy.management.SelectLayerByAttribute(pt_tbl, "CLEAR_SELECTION") print("Point Table Offsets Calculated")

Re: Optimizing UpdateCursor Script

DavidPike — Thu, 04 Jan 2024 17:27:11 GMT

I won't pretend to know what the script is doing but instead of iterating over your list of numbers, can you just run the select by attributes based on the list? Also are you sure the selection is being honoured as you just seem to reference the same table in the cursor rather than:

selected_features = arcpy.SelectlayerByAttribute(pt_tbl, ...) with arcpy.da.UpdateCursor(selected_features, fields) as cursor:

As an aside I'd recommend some code comments and more understandable variable names if someone ever has to read or edit your code in the future.

arcpy.management.SelectLayerByAttribute(pt_tbl, "NEW_SELECTION", "NUM_SEC IN '%s'" %(traillistsortpt))

Re: Optimizing UpdateCursor Script

JoshuaBixby — Thu, 04 Jan 2024 20:59:41 GMT

I only have time to comment on the SearchCursor structure, the current code can be condensed down to two lines:

with arcpy.da.SearchCursor(pt_tbl, "NUM_SEC") as cursor2: traillistsortpt = sorted(set(row[0] for row in cursor2))

Re: Optimizing UpdateCursor Script

JoshuaBixby — Thu, 04 Jan 2024 21:09:06 GMT

When you say " 6,000+ piece dataset", do you mean the traillistsortpt list is 6,000+ items or the pt_tbl layer has 6,000+ records?

Re: Optimizing UpdateCursor Script

BlakeTerhune — Thu, 04 Jan 2024 23:01:13 GMT

As @DavidPike mentioned, your variable naming makes your code unnecessarily hard to follow.
The SearchCursor portion that builds the traillistsortpt list is immediately overwritten with hardcoded values on line 14. Why?
You can use enumerate() to count iterations of a thing instead of counting it yourself with i
At the very least, use the string format() method instead of the old %s operator. If you're working in ArcGIS Pro (Python 3), use the f-string
Declaring cursors outside the loop is more performant because it won't get recreated with every iteration of the loop.
Sequence unpacking can sometimes be a convenient way to name the cursor's individual values instead of referring to them by their row index. It gets a bit much if you have more than a handful of fields though, in which case you could make each row a dictionary with field names.
You should utilize the where_clause parameter of the UpdateCursor instead of making a selection.
Your logic for setting the Offset field value (row[5]) seems flawed. You're checking for a value in oor before you put the value in oor.

Re: Optimizing UpdateCursor Script

BlakeTerhune — Thu, 04 Jan 2024 23:18:29 GMT

I'm not going to touch the logic you have that calculates offset. You'll need to review that on your own.

pt_tbl = "ArcPRO_MasterQuery_EXT_pt" pt_tbl_fields = ["DISTANCE", "DESCRIPTION_1", "DESCRIPTION_2", "DESCRIPTION_3", "Offset"] trail_num_sec = ["11-1", "11-2"] comma_sep_str = ",".join([f"'{i}'" for i in trail_num_sec]) expression = f"NUM_SEC in({comma_sep_str})" oor = {} oo = {} with arcpy.da.UpdateCursor(pt_tbl, pt_tbl_fields, expression) as u_cursor: for distance, desc_1, desc_2, desc_3, offset in u_cursor: oo[distance] = oo.get(distance, 0) + 1 if "RHS" in [desc_1, desc_2, desc_3]: oor[distance] = oor.get(distance, 0) + 1 offset = (oo[distance] - 1) * -1 elif distance in oor: offset = (oo[distance] - (oor[distance]+1)) else: offset (oo[distance] - 1) u_cursor.updateRow([distance, desc_1, desc_2, desc_3, offset]) print("Point Table Offsets Calculated")

Re: Optimizing UpdateCursor Script

MErikReedAugusta — Mon, 08 Jan 2024 17:28:38 GMT

I feel like this post solves all of the main issues I saw really well, but I want to emphasize a couple points that burned me early on, when I couldn't figure out why my code was taking forever to run:

1. The Data Access Cursors have a ton of work to do and can be more computationally intensive than you might think. Avoid calling them inside loops like this unless you want to go through your whole dataset multiple times. (Especially avoid calling Cursor loops inside other Cursor loops. That way madness lies.) It's often much better for the Cursor to be your outermost loop, and do whatever other operations you have to do inside that cursor loop.

2. @JoshuaBixby 's comment up above condensing your first Cursor is a great example of using the speed of list comprehension to your advantage. To avoid getting technical, even though his script looks like it's just a compression of yours into a single line, it's more efficient than even just that. List comprehension does a bunch of stuff behind the scenes in more efficient form, and you can often save processing time there.