Memory leak in loops

4980
10
12-14-2010 12:31 AM
YICHUANSHI
New Contributor
Hi all,
I have recently written a python script which involves some heavy computation. It first performs an intersection with an underlying base layer and writes the result to the memory workspace and then calculates the area for each resultant part after the intersection and summarizes them. I know the python garbage collector doesn???t work, so I explicitly deleted all intermediate variables after each loop. However, what I have experienced is still an ever increasing consumption of memory after each loop, which, given a large dataset, eventually leads to ???out of memory??? error and crash.
Any comment is appreciated!

Here is part of the code (the LOOP!):


for eachspecies in specieslist:

    counter += 1    
    # for each species, make a layer and use that layer to intersect    
    layer = str(counter)
    try:
        whereclause = "\"" + speciesfield + "\" = '" + iucnlib.SingeQuoteToDoubleSQL(str(eachspecies)) + "\'" 
        gp.makefeaturelayer(shp, layer, whereclause)
    except:
        iucnlib.Printboth("Error: Making layer for species %s failed! Skipping this species..."%str(eachspecies))
        iucnlib.Printboth(gp.getmessage(2))
        del layer
        continue
    
    intersectshp = 'in_memory' + '\\tmp' + str(counter)

    # spatial intersection    
    try:
        gp.intersect_analysis(layer + ';' + ecobaselayer, intersectshp)
    except:
        iucnlib.Printboth("Error: Intersection for species %s failed! Skipping this species..."%str(eachspecies))
        iucnlib.Printboth(gp.getmessage(2))
        del intersectshp
        continue

    #calculating areas for each geometry, using coordinates system specified by users    
    dict = iucnlib.CalcuateTotalArea(intersectshp, "REALM", spatialref)

    totalarea = 0    
    for eachkey in dict.keys():
        totalarea += dict[eachkey]/1000000
        
    #output result text string
    msg = str(eachspecies)
    for each in list:
        if dict.has_key(each):
            # in square kilometres
            msg = msg + ',' + '%.2f'%(dict[each]/1000000) + ',' + '%.2f'%(100*(dict[each]/1000000)/totalarea) +'%'
        else:
            # default no value '0'
            msg = msg + ',' + str(0) + ',' + str(0)
    msg = msg + '\n'
    iucnlib.Log_output(msg, result)
    iucnlib.Printboth("Species %s has finished"%str(eachspecies))

    # delete intermediate variables
    del intersectshp, layer
Tags (2)
0 Kudos
10 Replies
ChrisMathers
Occasional Contributor III
Why make them in memory? Try writing to disk and see if that helps. It may be slower but it also may complete.
0 Kudos
ChrisSnyder
Regular Contributor III
Or at least delete the in_memory output (after writting it to disk).

In my experience, there is only so much space in the RAM (about 1.7 GB) before the in_memory workspace fills up.
0 Kudos
YICHUANSHI
New Contributor
Thanks very much for your advice.

To clm42: There is a huge performance improvement when i start dumping intermediate results to the memory workspace. I haven't run the whole dataset writing intermediate results to disk and i'll give it a try. The thing that bothers me is it seems after each loop the memory consumption increases by 1Mb. I don't know if that has anything to do with writing to memory or disk.

To  Chris: I did remove the reference variable to the in-memory data after each loop. I think it should do it. Otherwise I would expect the memory filled up much much earlier.
0 Kudos
LoganPugh
Occasional Contributor III
As you may have surmised, the Python garbage collector knows nothing about ArcGIS's in-memory workspace and will not delete data from it just because you deleted the variable referencing it. You need to actually delete the data using the geoprocessor (e.g. gp.Delete_management).

According to the help, you can either delete individual tables or feature classes in the in-memory workspace, or delete the entire in-memory workspace itself.

http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//002w0000005s000000.htm
0 Kudos
ChrisSnyder
Regular Contributor III
I did remove the reference variable to the in-memory data after each loop. I think it should do it.


Like Logan said (and what I meant in my 1st message), you have to explicitly delete the in_memory feature class ( gp.Delete_Managment("in_memory\\output_4") ). It takes up RAM, just like an on-disk FC would take up disk space. Deleting the variable:

del myFC

Does not do anything at all to free up the RAM...
0 Kudos
YICHUANSHI
New Contributor
Thanks very much lpugh01 and Chris! As you said, it didn't delete the in-memory fc when i drop the reference. Problem solved, very much appreciated it. I think ESRI should put this caveat in relation to Python somewhere more explicitly. It took me a whole day trying to figure out and test (I didn't even think that this could be wrong)
0 Kudos
ChrisSnyder
Regular Contributor III
Glad that did the trick!

FYI - From the "in_memory workspace" help topic http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//002w0000005s000000.htm:

When data is written to the in-memory workspace, the computer's physical memory (RAM) is consumed. If too much data is written to this workspace, all the computer's memory may be used up and additional data cannot be written to memory.
0 Kudos
LyniseWearne
New Contributor
Hi all,

I am having a similar problem as described below and think this is related to memory leaks. The script runs succssfullly for about 8 hours then shuts down in the middle of processing. Can someone please have a look at the attached script, any suggestions would be appreciated.

Thanks
Lynise
0 Kudos
ChrisSnyder
Regular Contributor III
Note sure what the issue is... Used to be in pre v10.0, you could run SpatialAnlayst tools in a loop for a year and they never had any memory issues.

Maybe don't delete the scratch files every loop, just let arcpy overwrite them and delete them at the very end (unindent all your arcpy.Delete statements 1 tab), and include this line of code before your loop:
arcpy.env.overwriteOutput = True


Take note of the memory usuage in the task manger... Does it just build and build with each loop?

How about calling a seperate process for each loop... See:

http://forums.arcgis.com/threads/33602-Arcpy-Multiprocessor-issues

You can look for other Python-based parallel processing examples in the forums using the keywords "os.spawnv", "subprocess", "multiprocessing", or "parallel python".
0 Kudos