Python performance...

TomSellsted · ‎12-13-2011

One of the improvements I had heard about at this years UC was the speed of python at 10.1.

I have python process that I run to import data from our county assessors office ( they still use coverages...) into our geodatabase for our own evil purposes. This takes about 6 hours to process on our old GIS server. So I was very interested to see how fast this would go on a new server with a faster processor, more memory, enterprise database and a much faster python.

It is actually slower and bails out of the process in the middle. The python memory footprint is about 1.7gb. So it looks like some sort of memory leak. I reviewed my python code and it appears to be ok. It is identical to what is running on my production server except that it writes to a different geodatabase.

The process reads the parcel shape file and associated datasets and uses and insertcursor to put them into the geodatabase.

Have I found a but or do I have some bad code?

DavidWynne · ‎12-13-2011

Hi Tom,
What was the error mesage you saw when the tool failed?

One thing I spotted in your showGpMessage method; just convert that last call to arcpy.GetMessages()

def showGpMessage():
    arcpy.AddMessage(arcpy.GetMessages())
    print >> open(logFile, 'a'), arcpy.GetMessages()
    print gp.GetMessages()

A couple of things that would give you a time boost ...
1. Remove the UpdateCursor and replace with the TruncateTable_management.
2. Update the script to use cursors in the arcpy.da module (although this is not a direct swap, the signature and techniques vary some).

-Dave

TomSellsted · ‎12-14-2011

David,

Thanks very much for the information. Interesting too. I had made a bad assumption that the cursors would just be faster. I will check into the DA module and see if it will help.

I ran a couple of experiments with my existing code and it still looks like there is some sort of memory leak. I get an error message stating:

Runtime Error!
Program: c:\Python27\ArcGIS10.1\pythonw.exe
abnormal program termination

DavidWynne · ‎12-14-2011

Thanks very much for the information. Interesting too. I had made a bad assumption that the cursors would just be faster. I will check into the DA module and see if it will help.

Hi Tom,
There will be some improvement in the speed of the 'classic' cursors, but more of a matter of degree. However, the new arcpy.da cursors will typically show multiple factors of improvement performance-wise over the classic cursors.

-Dave

KimOllivier · ‎02-07-2012

Six hours! :rolleyes:I hope you found the bottleneck.

That would break my personal "Cup of Coffee Rule":

"If any single process takes longer than a cup of coffee, interrupt it and find a better way".
Since you are also running out of memory, finding what is causing that will probably speed things up enormously as well.

I like TruncateTable_management. I note that there is no help in the Beta for this new utility Esri, ...

You don't say how many records you have, but I would expect a couple of million records to take less than half an hour.
Suggestions to find the problem, don't give up until you can have that cup of coffee while it is still hot:

I immediately see a red flag where you open SDE which will inevitably be across a sloooow network, or at least it will be using
sloow handshaking. Could you try loading into a filegeodatabase on a local drive and then copy the file geodatabase in a single step?

Maybe FME (or Data Interoperability Extension) might be faster?

Maybe load the shapefiles separately directly into a filegeodatabase and then use some SQL queries to do the selection and editing,
instead of doing it all with the cursor. Very large records with many fields will be slow to load into a database. You could use MakeQueryTable to create a subset and then write out the view to SDE.

Does the database have any indexes? You should drop the indexes when you truncate otherwise every insert will trigger a re-index.
Does it get slower with more data. Try loading the first 10% to see if that is faster in proportion.

Even though you are careful to trigger a garbage collection with del statements, it clearly isn't working. Have a look at your memory usage using Task Manager.
If it keeps going up, then that might be the solution. If you can restructure the script to use a function, sometimes that garbage collects better.

You might be more successful if you could batch the transactions. The cursors are a bit simple here. But FME can do this better.

I find using Python and tools with more than a million records hits some sort of limit, even with my 8 CPU workstation. So I partition the work to use less than 1M records and it completes in a few minutes instead of never. Even for aspatial SQL queries.