Hello everyone,
I'm building a script to calculate offset values for linearly referenced data based on how many times distance values occur in a dataset. This is the script I've come up with and it works how I'd like it to, though once applied to the 6,000+ piece dataset it takes hours to complete.
Any thoughts on how I might be able to speed it up?
#pt table
pt_tbl = 'ArcPRO_MasterQuery_EXT_pt'
with arcpy.da.SearchCursor(pt_tbl, "NUM_SEC") as cursor2:
traillist = []
tns = set(cursor2)
for thing in tns:
for d in thing:
traillist.append(d)
traillistsortpt = sorted(traillist)
print("Calculating Point Table Offsets...")
traillistsortpt = ["11-1", "11-2"]
i = 0
for num in traillistsortpt:
arcpy.management.SelectLayerByAttribute(pt_tbl, "NEW_SELECTION", "NUM_SEC = '%s'" %(num))
oor = {}
oo = {}
with arcpy.da.UpdateCursor(pt_tbl, ["OID@", "DISTANCE", "DESCRIPTION_1", "DESCRIPTION_2", "DESCRIPTION_3", "Offset"]) as cursor1:
for row in cursor1:
distance_value = row[1]
oo[distance_value] = oo.get(distance_value, 0) +1
if "RHS" in str(row[2]) or "RHS" in str(row[3]) or "RHS" in str(row[4]):
oor[distance_value] = oor.get(distance_value, 0) + 1
orh = (oo[distance_value] - 1) * -1
row[5] = orh
elif row[1] in oor:
o = (oo[distance_value] - (oor[distance_value]+1))
row[5] = o
else:
o = (oo[distance_value] - 1)
row[5] = o
cursor1.updateRow(row)
i+=1
print(i, " of ", len(traillistsortpt))
arcpy.management.SelectLayerByAttribute(pt_tbl, "CLEAR_SELECTION")
print("Point Table Offsets Calculated")
I won't pretend to know what the script is doing but instead of iterating over your list of numbers, can you just run the select by attributes based on the list? Also are you sure the selection is being honoured as you just seem to reference the same table in the cursor rather than:
selected_features = arcpy.SelectlayerByAttribute(pt_tbl, ...)
with arcpy.da.UpdateCursor(selected_features, fields) as cursor:
As an aside I'd recommend some code comments and more understandable variable names if someone ever has to read or edit your code in the future.
arcpy.management.SelectLayerByAttribute(pt_tbl, "NEW_SELECTION", "NUM_SEC IN '%s'" %(traillistsortpt))
I only have time to comment on the SearchCursor structure, the current code can be condensed down to two lines:
with arcpy.da.SearchCursor(pt_tbl, "NUM_SEC") as cursor2:
traillistsortpt = sorted(set(row[0] for row in cursor2))
When you say " 6,000+ piece dataset", do you mean the traillistsortpt list is 6,000+ items or the pt_tbl layer has 6,000+ records?
I'm not going to touch the logic you have that calculates offset. You'll need to review that on your own.
pt_tbl = "ArcPRO_MasterQuery_EXT_pt"
pt_tbl_fields = ["DISTANCE", "DESCRIPTION_1", "DESCRIPTION_2", "DESCRIPTION_3", "Offset"]
trail_num_sec = ["11-1", "11-2"]
comma_sep_str = ",".join([f"'{i}'" for i in trail_num_sec])
expression = f"NUM_SEC in({comma_sep_str})"
oor = {}
oo = {}
with arcpy.da.UpdateCursor(pt_tbl, pt_tbl_fields, expression) as u_cursor:
for distance, desc_1, desc_2, desc_3, offset in u_cursor:
oo[distance] = oo.get(distance, 0) + 1
if "RHS" in [desc_1, desc_2, desc_3]:
oor[distance] = oor.get(distance, 0) + 1
offset = (oo[distance] - 1) * -1
elif distance in oor:
offset = (oo[distance] - (oor[distance]+1))
else:
offset (oo[distance] - 1)
u_cursor.updateRow([distance, desc_1, desc_2, desc_3, offset])
print("Point Table Offsets Calculated")
I feel like this post solves all of the main issues I saw really well, but I want to emphasize a couple points that burned me early on, when I couldn't figure out why my code was taking forever to run:
1. The Data Access Cursors have a ton of work to do and can be more computationally intensive than you might think. Avoid calling them inside loops like this unless you want to go through your whole dataset multiple times. (Especially avoid calling Cursor loops inside other Cursor loops. That way madness lies.) It's often much better for the Cursor to be your outermost loop, and do whatever other operations you have to do inside that cursor loop.
2. @JoshuaBixby 's comment up above condensing your first Cursor is a great example of using the speed of list comprehension to your advantage. To avoid getting technical, even though his script looks like it's just a compression of yours into a single line, it's more efficient than even just that. List comprehension does a bunch of stuff behind the scenes in more efficient form, and you can often save processing time there.