Select to view content in your preferred language

How to improve arcpy.da.UpdateCursor / InsertCursor performance?

660
14
04-15-2024 05:48 PM
EdX1
by
New Contributor III

Hello,

I am running a script via the ArcPro python terminal, with the layers I am using in my script open in the project. sTabName is a string which contains the name of the table I have opened in ArcPro. 

I have a table which has 200k records, I wish to update and insert <20 items within this table.
I need to firstly use an UpdateCursor to update a few rows, then a InsertCursor to insert some rows to replace those I updated.

My code currently:

 

 

 

 # Create a table view to use within the edit session
            arcpy.SelectLayerByAttribute_management(sTabName, "NEW_SELECTION", where_clause=sOldWhere) 
            TableViewCount = int(arcpy.GetCount_management(sTabName).getOutput(0))
            print(f"sTabName with sql querey: {sOldWhere}, applied. TableViewCount contains: {TableViewCount} rows")

            # Start editing session
            edit = None  # Initialize edit variable
            edit = arcpy.da.Editor(SDE_Workspace_Path)
            edit.startEditing(False, True) 
            edit.startOperation()

            # Retire the old assessments
            print("Retiring old assessment(s)...")
            count = 0
            with arcpy.da.UpdateCursor(sTabName, ("RecordStatus", "Retired")) as oRows:
                for oRow in oRows:
                    oRow[0] = "Retired"
                    sretired = "Title retired & replaced by {0}".format(",".join(aNewSIDs))
                    if len(sretired) > 250:
                        oRow[1] = 'Title retired & replaced by Multiple parcels'
                    else:
                        oRow[1] = sretired
                    oRows.updateRow(oRow)
                count += 1
                print(f"retiring item {count} of {TableViewCount}")

 

 

 

 

The Problem:
This seems to take a very long time to open up the UpdateCursor, >1-2 minutes, and the search cursor which comes after it (not included in code sample). 
How can I improve efficiency given that I am only updating the specific items from where_clause=sOldWhere?

I have tried using:
arcpy.MakeTableView_management(sTable_path, "TableView", where_clause=sOldWhere)
instead of what is at Line 2, which does seem to run faster, but this creates a tableview item which I then have to remove (to clean up) which adds more processing time once again.

I wish to reduce the processing time of this script as much as possible as it is manually run multiple times over a day to complete a task. 

 
0 Kudos
14 Replies
EdX1
by
New Contributor III

@RhettZufelt wrote:

Have not tested this with 3.x, but with 2.7, if I commented out any print statements, my scripts will run in a fraction of the time.

So, in most all my scripts, I set a debugPrint variable at the top, then I have replaced the print statements with:

 

 

debugPrint = True
.
.
.
if debugPrint:print('What ever you want')

 

 

So, normally, debugPrint = False so that the script runs much faster, but If I'm having issues, I toggle it to True so I can get the printouts.

R_


Thanks! This is a very elegant way to add debug messages rather than commenting them out like I do at the moment!

0 Kudos
TonyAlmeida
Occasional Contributor II

Try this, change line 33 to the common field and line 9 & 12.

I haven't tested it yet.

import arcpy
import os
from openpyxl import load_workbook

# Set the workspace (folder or geodatabase) where the Excel file is stored
arcpy.env.workspace = "C:/Temp"

# Path to the Excel file
excel_file = "C:/Temp/file.xlsx"

# Path to the feature class
feature_class = r"feature class"

# Load the Excel workbook
workbook = load_workbook(excel_file)

# Get the list of worksheet names
worksheet_names = workbook.sheetnames

# Loop through each worksheet
for worksheet_name in worksheet_names:
    # Create a unique name for the table view by replacing problematic characters
    table_view_name = f"View_{worksheet_name.replace('$', '_')}"

    # Create a full path to the Excel worksheet
    worksheet_path = os.path.join(excel_file, f"${worksheet_name}")

    # Create a table view from the Excel sheet
    arcpy.MakeTableView_management(in_table=worksheet_path, out_view=table_view_name)
    print(f"Table view created for {worksheet_name}")

    # Define the join field name, common between the feature class and the Excel sheet
    join_field = "Field" # change field

    # Add a join between the feature class and the table view
    arcpy.AddJoin_management(in_layer_or_view=feature_class, in_field=join_field, 
                             join_table=table_view_name, join_field=join_field, 
                             join_type="KEEP_COMMON")

    # Optionally, you can export the joined data to a new feature class
    output_feature_class = f"Joined_{worksheet_name.replace('$', '_')}"
    arcpy.CopyFeatures_management(in_features=feature_class, out_feature_class=output_feature_class)
    print(f"Joined feature class created: {output_feature_class}")

    # Remove the join to clean up before the next iteration
    #arcpy.RemoveJoin_management(feature_class)

    # Delete the table view to avoid locks and clean up
    #arcpy.Delete_management(table_view_name)

 

0 Kudos
BrandonMcAlister
Occasional Contributor

Hi @EdX1,

Have you tried saving the script to a toolbox? I write my scripts with ArcGIS notebooks and I've noticed that when I run them in Notebooks they take 4x longer to complete, I haven't tried the python window, but from my understanding it shouldn't bog down the scripts unless there is something weird going on.

 

Anyway I have a feature class that has 3,000 plus records which searches through all of them and updates all of them and when I run it from a tool box it takes ~30 minutes. 

I would first try saving your code to a tool box and run it from there then try adapting my code below. 

MP.clearSelection() #Clears any selected features

arcpy.SetProgressorLabel("Updating attribute information") #Changes tool progress label
arcpy.SetProgressorPosition(1) #Resets progressor to beginning

arcpy.management.SelectLayerByAttribute(sTabName, where_clause = sOldWhere) #Select features for count
count = arcpy.management.GetCount(sTabName) #Get count of selected features

print("What you want" + str(count))

if int(str(count)) > 0:

    i = 0

    while i < int(str(count)):

        if int((i/int(str(count))*100) > 1:

            arcpy.SetProgressorPosition(int((i/int(str(count))*100)) #updates tool progress label

        now = datetime.datetime.now() #gets time process started

        sTabName.definitionQuery = "Your SQL selection here" #Make it so the search cursor will only have to search the 20 records

        Add your edit line here

        upcursor = arcpy.da.UpdateCursor(sTabName, ["RecordStatus", "Retired"])

        for row in upcursor:

            row[0] = "Retired"

            sretired = "Title retired & replaced by {0}". format(",".join(aNewSIDs))

             if len(sretired)>250:

                    row[1] = "Title retired & replaced by multiple parcels"

             else:

                    row[1] = sretired

         upcursor.updaterow(row)

          edit.stopEditing(True) #saves edits and closes edit session

          end = datetime.datetime.now() #gets time code ended

          elapsed = end - now

    print("Updating: " + str(count) + " records from " + str(sTabName.name) + ". Elapse time: " + str(elapsed)) #or something similar

   print(What ever you want)

 

I slightly adapted that for you, but when I run the code above I get about .5 seconds per record. Hope it works for you!

Thanks,
Brandon
RhettZufelt
MVP Notable Contributor

have you tried putting the where clause in the cursors themselves?  Might help limit what is being loaded.

R_

EdX1
by
New Contributor III

Good suggestion! Also allows me to remove the first 4 lines, and does seem to speed it up a small amount. 
Thanks. 

0 Kudos