updateCursor eventally hangs updating FeatureService

DavidSDouglas · ‎03-13-2024

I'm new to Python and ArcGIS.
I've written a script (Python 3.9.16, installed with ArcGIS Pro 3.1.2) that updates a single numeric field in a Utility Networks Feature Service using an UpdateCursor, but at some point the script hangs. It can be after 5000 updates, or 200,000 - I can't see a pattern, even when repeating on the same data set.
There are around 600,000 rows in the UpdateCursor. It should take about 22 hours, but I haven't seen it run for more than 7 hours before it hangs.
I assume it's necessary to commit every so often, so I call this after each 1000 updates:

def flush( ed 😞
global edit

if edit is not None:
edit.stopOperation()
edit.stopEditing(True)
edit.startEditing(False, True)
edit.startOperation()

The first three operations take around 1 second in total, startOperation takes around 120 seconds after each 1000 updateRow operations (the 1000 cursor.updateRow(row) executions take around 10 seconds).
Time rises linearly when increasing the interval. I also tried calling stopEditing and startEditing less frequently - for example after every 2000, or 5000 updates, but the time per 1000 features was unchanged and it didn't seem to affect when the script started to hang.

When it hangs, it is during edit.startOperation() or edit.stopOperation()

It's running in the test environment and nothing else is running here. When doing such updates in the production environment it has to be outside of working hours, otherwise ArcGIS Pro is extremely slow.

Does anyone have any suggestions - apart from splitting the update task into smaller batches? If this was a database, setting a column value for 600,000 rows would take no time at all.

arcpy.SignInToPortal(urlTest, usernameTest, passwordTest)
fc = serviceUrlTest
workspace = os.path.dirname(fc)

edit = arcpy.da.Editor(workspace, true) # versioned
edit.startEditing(False, True)
edit.startOperation()

GeeteshSingh07 · ‎03-13-2024

@DavidSDouglas not sure, if this helps: https://support.esri.com/en-us/bug/the-updatecursor-function-runs-slowly-for-hosted-featur-bug-00013...

--
Best Regards,
Geetesh
geetesh486@gmail.com

View solution in original post

GeeteshSingh07 · ‎03-13-2024

@DavidSDouglas not sure, if this helps: https://support.esri.com/en-us/bug/the-updatecursor-function-runs-slowly-for-hosted-featur-bug-00013...

--
Best Regards,
Geetesh
geetesh486@gmail.com

Clubdebambos · ‎03-14-2024

I don't think there is a better answer than what @GeeteshSingh07 has provided. Using the Field Calculator and setting the expression_type to SQL should be a rapid update.

Another avenue is using the ArcGIS API for Python to update feature layer records, but I would stick with the above. Let us know how you get on and how fast the 600k records update.

~ learn.finaldraftmapping.com

DavidSDouglas · ‎03-14-2024

Geetesh, thanks, I will give it a try (when time permits), and write how I get on. This was the simplest case, where all fields need to be given a fixed numerical value (and where it's not possible to do with ArcGIS Pro, as it times out). There are however other things I need to do eventually as a consequence of some data not having been correctly migrated from Oracle on our G/Technology to ArcGIS project. I have extracted data from Oracle afterwards and created a Python dictionary with it, where each ArcGIS row needs to be individually updated with data from a lookup in this dictionary. I will try the simple updates first!

jackysins · ‎03-14-2024

Have you considered checking for memory leaks or optimizing your code for efficiency? It might also help to break down your updates into smaller batches and commit changes more frequently to avoid hanging. Keep troubleshooting, and you'll find a solution soon.

RhettZufelt · ‎03-14-2024

Is this a Hosted Feature Service?

I am doing something similar, (updating certain columns in a Hosted Feature Service from python dictionary) using the python API .calculate function.

Since hosted on AGOL, it will often (more often than not) fail to finish, time out with error.

So, in my python script, I make a list of all the unique id's that I want to update, then as I go through the dictionary updating my columns, I append the successful id's to that list.

Then, I call the function(s) like so:

def CreateSignDict():
    # search cursor to populate my dictionary
def UpdateSigns():
   #get variable/datasets/workspaces
   if mc not in donelist:
       #do my updates
       donelist.append(mc)   # add successful id's to the done list
try:
    CreateSignDict()
    while True:
        try:
            UpdateSigns()
            break # stop the loop if the function completes sucessfully
        except Exception as e:
            print("Function errored out!", e)
            print("Retrying ... ")
        print("Done with inner try")

This will keep running the UpdateSigns() function until all the values have been updated.

Not sure you can incorporate this into your setup or UpdateCursor, but has been working for me for a few years now.

R_

DavidSDouglas · ‎03-18-2024

It's not a Hosted Features Service, the data is in a SQL Server database, though direct access to the database is not allowed since it uses archiving. I can't realistically do one update per UpdateCursor as it takes 3 seconds to create an UpdateCursor and select a particular feature.

I will do some experiments with CalculateField and get back later.

DavidSDouglas · ‎03-21-2024

thanks for all the replies. CalculateField works really well when it works - I was able to update around 78,000 features in 8 minutes - I did this repeatedly in the test environment and it worked each time. Changing the where clause so it tried to update 85,000 consistently hangs - when I eventually stop the script, no features have been updated.

We (the Danish national natural gas utility) have a number of performance issues - ArcGIS is the first system to be moved to Azure and we don't have much experience in that - and the consultants the company has hired have no knowledge of GIS. The client running the python script is also in the cloud, so there shouldn't be network problems. There may well be memory leaks, but it's not something I can do anything about if that's the case - it's hanging inside CalculateField. Hopefully when we fix all the performance problems users are complaining about, CalculateField will be able to manage larger queries. Until then I'll have to split them up.

arcpy.management.MakeFeatureLayer(servUrlProd_num, "stik", "ASSETGROUP = 1 And ASSETTYPE > 0 and installdate < timestamp '2002-06-30 00:00:00'")

arcpy.management.CalculateField("stik", "validationstatus", "5", "Python 3")