Python memory error (how to pin it down)

Discussion created by jamessample on Nov 8, 2010
Latest reply on Nov 11, 2010 by jamessample
Hi all,

If anyone can point me in the right direction I'd be extremely grateful - I'm running out of ideas.

I have a fairly lengthy geoprocessing script (attached) that calculates a "water balance" for my study region. I have rasters representing monthly rainfall and evapo-transpiration going back over 4 decades, together with information on soil properties etc.

My script loops over all of my time-series data and writes the output rasters to a folder on my hard disk. It works fine when I use only a sub-set of the data, but with the whole lot I get a "Memory error" (nothing more helpful unfortunately). I can't easily run my code in chunks as the output from each loop is fed into the next loop as the input; it'd be nice to do it all in one go.

When I watch my code run in the task manager it clearly releases most of the memory that has been used at the end of each loop, but a small amount is not released and the memory usage grows over time. I'm trying to pin this down, but I'm fairly new to all this and the deeper I dig the more confused I get! As far as I can tell from the forums, there are three main possibilities (feel free to add more!):

1. The geoprocessor is leaking memory,
2. My python code (lots of numpy algebra) is leaking memory,
3. I'm trying to write too many rasters into a workspace and there's some kind of limit I don't know about.

For 1, I've read on the forums that the geoprocessor can leak memory. Most of these posts relate to older versions of the gp - have all of these problems been solved for version 9.3? My code actually makes very little use of the geoprocessor anyway: within each loop, I use gp.Resample_management twice and gp.AddMessage once. Is this enough to cause a serious memory leak over many iterations? Some of the forum posts give me the impression that gp memory leaks are characterised by continuously increasing memory consumption and a decrease in processing speed - is this right? My code memory usage oscillates and grows slowly, the processing speed only decreases slightly during runtime.

For 2, I don't really know where to start. My code uses the excellent MGET toolset (http://code.env.duke.edu/projects/mget) to convert ESRI grids into numpy masked arrays. I then perform lots of array algebra using numpy before using MGET to turn the output back into an ESRI grid. I'm trying this approach because it's much faster than using the geoprocessor and the syntax is more intuitive. I don't really understand how Python deals with memory allocation. I've read about something called the garbage collector, and if I explicitly delete and then collect all of my intermediate objects at the end of each loop (using del) my code gets a bit further. It still grinds to a halt eventually though. I naively assumed that Python would be able to loop more-or-less indefinitely, simply over-writing the previous set of objects on each loop. Is this not the case? If I delete all of my intermediates, what else is there that could be hogging memory?

3. I've read that there might be a limit to the number of rasters that can exist in a particular workspace. My output rasters are grouped by year into folders, so I don't think this should be a problem, but is there some kind of memory expenditure involved in repeatedly writing to the same workspace (clutching at straws here)?

I've read about using spawnv/sub-process to force the gp to release its memory at the end of each iteration, but I'm not too sure how to go about doing this, and if my problem isn't with the geoprocessor it might not get me very far. As the vast majority of my code is within a loop, will the sub-process not just run out of memory instead? (Sorry if this is a daft question).

My code's attached below. I'm new to programming, so it's probably horrendous - apologies.

Thanks for reading this far. Any tips or suggestions very much appreciated!

For info I'm using:
ArcInfo 9.3.1, Python 2.5.
Windows XP with 2GB RAM