How to improve arcpy.da.UpdateCursor / InsertCursor performance?

EdX1 · ‎04-15-2024

Hello,

I am running a script via the ArcPro python terminal, with the layers I am using in my script open in the project. sTabName is a string which contains the name of the table I have opened in ArcPro.

I have a table which has 200k records, I wish to update and insert <20 items within this table.
I need to firstly use an UpdateCursor to update a few rows, then a InsertCursor to insert some rows to replace those I updated.

My code currently:

 # Create a table view to use within the edit session
            arcpy.SelectLayerByAttribute_management(sTabName, "NEW_SELECTION", where_clause=sOldWhere) 
            TableViewCount = int(arcpy.GetCount_management(sTabName).getOutput(0))
            print(f"sTabName with sql querey: {sOldWhere}, applied. TableViewCount contains: {TableViewCount} rows")

            # Start editing session
            edit = None  # Initialize edit variable
            edit = arcpy.da.Editor(SDE_Workspace_Path)
            edit.startEditing(False, True) 
            edit.startOperation()

            # Retire the old assessments
            print("Retiring old assessment(s)...")
            count = 0
            with arcpy.da.UpdateCursor(sTabName, ("RecordStatus", "Retired")) as oRows:
                for oRow in oRows:
                    oRow[0] = "Retired"
                    sretired = "Title retired & replaced by {0}".format(",".join(aNewSIDs))
                    if len(sretired) > 250:
                        oRow[1] = 'Title retired & replaced by Multiple parcels'
                    else:
                        oRow[1] = sretired
                    oRows.updateRow(oRow)
                count += 1
                print(f"retiring item {count} of {TableViewCount}")

The Problem:
This seems to take a very long time to open up the UpdateCursor, >1-2 minutes, and the search cursor which comes after it (not included in code sample).
How can I improve efficiency given that I am only updating the specific items from where_clause=sOldWhere?

I have tried using:
arcpy.MakeTableView_management(sTable_path, "TableView", where_clause=sOldWhere)
instead of what is at Line 2, which does seem to run faster, but this creates a tableview item which I then have to remove (to clean up) which adds more processing time once again.

I wish to reduce the processing time of this script as much as possible as it is manually run multiple times over a day to complete a task.

EdX1 · ‎04-17-2024

@RhettZufelt wrote:
Have not tested this with 3.x, but with 2.7, if I commented out any print statements, my scripts will run in a fraction of the time.
So, in most all my scripts, I set a debugPrint variable at the top, then I have replaced the print statements with:

debugPrint = True
.
.
.
if debugPrint:print('What ever you want')

So, normally, debugPrint = False so that the script runs much faster, but If I'm having issues, I toggle it to True so I can get the printouts.

R_

Thanks! This is a very elegant way to add debug messages rather than commenting them out like I do at the moment!

TonyAlmeida · ‎04-19-2024

Try this, change line 33 to the common field and line 9 & 12.

I haven't tested it yet.

import arcpy
import os
from openpyxl import load_workbook

# Set the workspace (folder or geodatabase) where the Excel file is stored
arcpy.env.workspace = "C:/Temp"

# Path to the Excel file
excel_file = "C:/Temp/file.xlsx"

# Path to the feature class
feature_class = r"feature class"

# Load the Excel workbook
workbook = load_workbook(excel_file)

# Get the list of worksheet names
worksheet_names = workbook.sheetnames

# Loop through each worksheet
for worksheet_name in worksheet_names:
    # Create a unique name for the table view by replacing problematic characters
    table_view_name = f"View_{worksheet_name.replace('$', '_')}"

    # Create a full path to the Excel worksheet
    worksheet_path = os.path.join(excel_file, f"${worksheet_name}")

    # Create a table view from the Excel sheet
    arcpy.MakeTableView_management(in_table=worksheet_path, out_view=table_view_name)
    print(f"Table view created for {worksheet_name}")

    # Define the join field name, common between the feature class and the Excel sheet
    join_field = "Field" # change field

    # Add a join between the feature class and the table view
    arcpy.AddJoin_management(in_layer_or_view=feature_class, in_field=join_field, 
                             join_table=table_view_name, join_field=join_field, 
                             join_type="KEEP_COMMON")

    # Optionally, you can export the joined data to a new feature class
    output_feature_class = f"Joined_{worksheet_name.replace('$', '_')}"
    arcpy.CopyFeatures_management(in_features=feature_class, out_feature_class=output_feature_class)
    print(f"Joined feature class created: {output_feature_class}")

    # Remove the join to clean up before the next iteration
    #arcpy.RemoveJoin_management(feature_class)

    # Delete the table view to avoid locks and clean up
    #arcpy.Delete_management(table_view_name)

BrandonMcAlister · ‎04-16-2024

Hi @EdX1,

Have you tried saving the script to a toolbox? I write my scripts with ArcGIS notebooks and I've noticed that when I run them in Notebooks they take 4x longer to complete, I haven't tried the python window, but from my understanding it shouldn't bog down the scripts unless there is something weird going on.

Anyway I have a feature class that has 3,000 plus records which searches through all of them and updates all of them and when I run it from a tool box it takes ~30 minutes.

I would first try saving your code to a tool box and run it from there then try adapting my code below.

MP.clearSelection() #Clears any selected features

arcpy.SetProgressorLabel("Updating attribute information") #Changes tool progress label
arcpy.SetProgressorPosition(1) #Resets progressor to beginning

arcpy.management.SelectLayerByAttribute(sTabName, where_clause = sOldWhere) #Select features for count
count = arcpy.management.GetCount(sTabName) #Get count of selected features

print("What you want" + str(count))

if int(str(count)) > 0:

i = 0

while i < int(str(count)):

if int((i/int(str(count))*100) > 1:

arcpy.SetProgressorPosition(int((i/int(str(count))*100)) #updates tool progress label

now = datetime.datetime.now() #gets time process started

sTabName.definitionQuery = "Your SQL selection here" #Make it so the search cursor will only have to search the 20 records

Add your edit line here

upcursor = arcpy.da.UpdateCursor(sTabName, ["RecordStatus", "Retired"])

for row in upcursor:

row[0] = "Retired"

sretired = "Title retired & replaced by {0}". format(",".join(aNewSIDs))

if len(sretired)>250:

row[1] = "Title retired & replaced by multiple parcels"

else:

row[1] = sretired

upcursor.updaterow(row)

edit.stopEditing(True) #saves edits and closes edit session

end = datetime.datetime.now() #gets time code ended

elapsed = end - now

print("Updating: " + str(count) + " records from " + str(sTabName.name) + ". Elapse time: " + str(elapsed)) #or something similar

print(What ever you want)

I slightly adapted that for you, but when I run the code above I get about .5 seconds per record. Hope it works for you!

Thanks,
Brandon

RhettZufelt · ‎04-16-2024

have you tried putting the where clause in the cursors themselves? Might help limit what is being loaded.

R_

EdX1 · ‎04-16-2024

Good suggestion! Also allows me to remove the first 4 lines, and does seem to speed it up a small amount.
Thanks.