Dealing with memory leaks in Python scripts in 10.0

1677
8
03-07-2014 02:52 AM
MikeTree
New Contributor III
I'm looking for strategies to deal with buggy memory leaks in v10.0

My script does some geoprocessing (mostly intersections and selections) on about 3000 polygons, using searchcursors as well. The script is pretty simple, but as it runs I watch the Memory Usage in task manager creep up to about 1.3 GB and then ArcGIS crashes with no warning.

So then I amended my script to keep track of how far the results got to on the last attempt, and to carry on from there. But this means manually restarting the procedure each time. I'd much rather devise some means to restart the script periodically, say every 500 polygons, but I can't figure out how to 'reset all' in Python.

What are the options for dealing with and working around ArcGIS's buggy memory leaks? What are the overall strategies? I currently run scripts from the Python window but it would be fairly easy to get them to work as standalone scripts. I haven't figured out how, in a standalone script, to periodically release all memory or resources, so if anyone has any tips on how to kill the ArcMap.exe process and start afresh that would be really helpful. That way, potentially I can scale my script to work on bigger datasets without having to crash and restart it several times.

Thanks
Tags (2)
0 Kudos
8 Replies
JamesCrandall
MVP Frequent Contributor
What is your data source?
What is the offending python code that you think is the problem?
0 Kudos
markdenil
Occasional Contributor III
You cound set up the processing script so it only processes 5000 records at a time
at the end of the run, have it save the highest record number processed
either in a text file or as a pickled python object (they are sooo nifty...)

Then, use a batch file to call the script over and over, as needed.
each time it starts, it retrives the new starting point recno from the record file.

That effectivly resets python for each run....
0 Kudos
MikeTree
New Contributor III
You cound set up the processing script so it only processes 5000 records at a time
at the end of the run, have it save the highest record number processed
either in a text file or as a pickled python object (they are sooo nifty...)

Then, use a batch file to call the script over and over, as needed.
each time it starts, it retrives the new starting point recno from the record file.

That effectivly resets python for each run....


Yes, that's the approach I'm after. How do I do that via batch, can you give me an example please?

Many thanks
0 Kudos
MichaelVolz
Esteemed Contributor
Can you provide your script and then additional code could be provided to work with your existing script?
0 Kudos
markdenil
Occasional Contributor III
The batch side is pretty easy:
@echo off
rem kickoffTheScript.bat
echo START
rem 1 - 5000
C:\Python27\ArcGIS10.1\python.exe C:\scripts\python\theScript.py
rem 5000 - 10000
C:\Python27\ArcGIS10.1\python.exe C:\scripts\python\theScript.py
rem 10000 - 15000
C:\Python27\ArcGIS10.1\python.exe C:\scripts\python\theScript.py
rem 15000 - 20000
C:\Python27\ArcGIS10.1\python.exe C:\scripts\python\theScript.py
echo DONE


In addition to your python script (theScript.py)
you will also have a text file (startFID.txt)
to begin, startFID.txt will have one entry: 0

The first thing theScript.py does is to find and read startFID.txt into a variable (say, fid1)
add 5000 to fid1 to get fid2

run your loop:
for x in range(fid1, fid2):
or use it in a cursor or makefeaturelayer where clause:
"\"FID\" >= fid1 and \"FID\" < fid2"

at the end of the run, overwrite the number in startFID.txt with fid2

the next itteration will pick up the new start point from the file again and go on.
adding the 5000 to fid1 (wich is now 5000) will now give you fid2 = 10000.....

This is very crude; you will likely find enhancements to fit your situation.
0 Kudos
ClintDow
Occasional Contributor
This approach works, I have had to resort to a similar solution a couple times. I would advise to redirect your output to a text file or greatly increase the number of lines of history in your command prompt if your scripts are verbose.
0 Kudos
KimOllivier
Occasional Contributor III
A memory leak with only 3000 polygons? I have never experienced that. I cannot think of any process that would not hold all the data in memory and process it in a few seconds. You have said that the script is simple, but your description has a bad feel mentioning a cursor and a tool in the same sentence.

If you are running a tool inside a cursor for each feature then I am not surprised you are running out of memory because the tools are not designed to be used that way. Think of the tools like an SQL query - operate on all the features in one operation. You do not loop around a table and run a spatial operation on the featureclass for each record. You should be able to redesign your process to run a tool once. Put it up and let us try to refactor it.
0 Kudos
markdenil
Occasional Contributor III
All good points, kimo.
My suggestion did not address the underlying problem, but just offered a work around.
I myself only have to resort to such tricks when dealing with 100's of millions of features.
0 Kudos