Performance of arcpy.da.updatecursor against versioned data

3700
3
09-25-2015 07:28 AM
JohnFannon
Occasional Contributor III

Hi

I've written a python script that updates a number of versioned feature classes in an SDE geodatabase (SQL Server) with a single attribute sourced from a CSV file.

The script basically does the following:

  1. starts an edit session
  2. reads the rows from the CSV file and uses the arcpy.da.updatecursor to return the correct row (using an ID in the where clause)
  3. then updates the value for that row in the feature class.
  4. stops the edit session and saves edits

This performs ok for a small number of records (<1000) but when scaled up to 100k+ records, the script seems to slow down almost to a crawl beyond a few thousand records and took 10+ days to complete. The feature classes involved are not overly large (10-50k) records each, so I'm surprised by the poor performance.

I have a feeling this is due to a build up of records in the delta tables in SDE, so as the data is updated, the query that the updatecursor is executing takes longer each time, because it's querying across the delta tables.

Can anyone else offer a better explanation of what's going on and perhaps how to improve the performance?

The only thing I can think of currently is to change the script to perform the edits in smaller batches (e.g. 1000).

Any thoughts much appreciated.

Regards

John

0 Kudos
3 Replies
JoshuaBixby
MVP Esteemed Contributor

Using the SQL WHERE clause in the fashion you describe is inefficient for a couple or more reasons.  If the CSV file isn't huge and memory isn't tight, reading the CSV file and loading it into a native Python data structure in-memory would allow you to drop the WHERE clause and iterate over the cursor once doing lookups on the CSV contents that are in memory.  Looking up data in Python data structures in memory is much less expensive than re-initializing cursors and passing WHERE clauses for single records.

If you must keep your general workflow, then there are some items to experiment with.  Have you tried disabling with_undo when starting an edit session?  Have you tried multiple edit operations within the same multiple edit session?

Can you provide any specific code?  It is easier for people to comment on specific code than give statements about general situations.

0 Kudos
BlakeTerhune
MVP Regular Contributor

Side topic:

I'm interested in seeing your code for the edit session. Is it arcpy.da.Editor?

0 Kudos
JohnFannon
Occasional Contributor III

Unfortunately, I don't think using the method described by Joshua would be sensible in this case, as there are multiple feature class involved probably totalling about 250k records. For example, if the input CSV file has only a few hundred records (which is also possible), then the process would then read through 250k records in the feature classes, just to update a few thousand, which would also not be very performant.

As I've said, using the cursor with where clause performs well with a few thousand records, but something is slowing it down as the number of records increases. See graph below taken from timings in the log file:

performance.png

I've attached a cut down version of the code below to demonstrate what I'm doing. All edits are performed within a single edit session (because if the process fails it needs to roll back all the edits) and each individual edit is done within an edit operation.

import arcpy, os, sys, csv

#get configuration from a file here

try:

    #open the csv
    with open(inputFileAbsPath,'rb') as inCSVFile:

        csvReader = csv.DictReader(inCSVFile, fieldnames=[config.inputBusinessIDField,config.inputIFSidField], delimiter=config.inputSeparator)
               
        #start edit session
        editor = arcpy.da.Editor(targetConFileAbsPath)
        editor.startEditing(False,True)
               
        try:
            #loop through rows in CSV
            hasErrors = False
            totalRecordCount = 0
            successRecordCount = 0
            for csvRow in csvReader:
                businessID = csvRow[config.inputBusinessIDField]
                businessIDPrefix = businessID[0:2]
                IFSid = csvRow[config.inputIFSidField]
                                         
                #loop through configured feature classes and determine which feature class requires updating based on the business ID prefix
                updatedOK = False
                for fcName in config.targetFCDict.keys():

                    #check if the business ID prefix is associated with this feature class
                    fcBusinessIDPrefixes = config.targetFCDict[fcName]
                    if businessIDPrefix in fcBusinessIDPrefixes:

                        fullFCPath = targetConFileAbsPath + "\\" + fcName
                        shortFCName = utilityFuncs.ParseFCName(fcName)

                        editor.startOperation()
                        try:
                            #get cursor using query on business id
                            cur = arcpy.da.UpdateCursor(fullFCPath,config.targetIFSidField,config.targetBusinessIDField + "='" + businessID + "'")
                            fcRow = cur.next()
                            if fcRow != None:
                                #update the row, save edits and stop searching other feature classes
                                fcRow[0] = IFSid
                                cur.updateRow(fcRow)
                                editor.stopOperation()
                                updatedOK = True
                                break
                            else:
                                #no record found with the supplied business id in this feature class
                                editor.abortOperation()
                            #end if

                            del fcRow
                            del cur
                        except StopIteration:
                            #no record found with the supplied business id in this feature class (the cursor throws this exception when no records found)
                            editor.abortOperation()
                        except Exception as ex:
                            editor.abortOperation()
                            raise Exception ("Error updating row with ID: " + businessID + " [" + str(ex) + "]")
                        #end try

                    #end if

                #end for
                       
                #check if updated successfully
                if updatedOK:
                    successRecordCount = successRecordCount + 1
                #end if

                totalRecordCount = totalRecordCount + 1

            #end for

        except Exception as ex:
            editor.stopEditing(False)
            raise
        #end try

    #end with

    editor.stopEditing(True)
                       
except Exception as ex:
    errMsg = "Error updating from " + inputFileAbsPath + "[" + unicode(ex) + "]"
    raise Exception(errMsg)
#end try