I've written a python script that updates a number of versioned feature classes in an SDE geodatabase (SQL Server) with a single attribute sourced from a CSV file.
The script basically does the following:
- starts an edit session
- reads the rows from the CSV file and uses the arcpy.da.updatecursor to return the correct row (using an ID in the where clause)
- then updates the value for that row in the feature class.
- stops the edit session and saves edits
This performs ok for a small number of records (<1000) but when scaled up to 100k+ records, the script seems to slow down almost to a crawl beyond a few thousand records and took 10+ days to complete. The feature classes involved are not overly large (10-50k) records each, so I'm surprised by the poor performance.
I have a feeling this is due to a build up of records in the delta tables in SDE, so as the data is updated, the query that the updatecursor is executing takes longer each time, because it's querying across the delta tables.
Can anyone else offer a better explanation of what's going on and perhaps how to improve the performance?
The only thing I can think of currently is to change the script to perform the edits in smaller batches (e.g. 1000).
Any thoughts much appreciated.