Use Subprocess to Alleviate Memory Usage Issues

MichaelVolz · ‎02-07-2012

To All Python Users:

I have a script that is updating mxds and then saving them. After saving the mxd I am deleting the mxd object, but memory usage keeps increasing until the python script just hangs.

I have read that I can use a subprocess to fix this problem, but I am unsure of the syntax to use. I will be drilling down through a directory and all of its subdirectories for mxd files (This will be my main python script). Once I have an mxd file as the focus, I want to use a subprocess to start another python script that will open the mxd, replace data sources, save the mxd and then delete the mxd object from memory. Then I will return to the main script to get another file which should free up memory as the subprocess has terminated.

Does anyone know the syntax I would use to call the secondary python script with an argument being the mxd file that has the focus? Any help or hints are greatly appreciated. Thank you.

ChrisFrost · ‎02-07-2012

It seems that the syntax I was using only works on linux, not windows.

for windows,

cmd = [r"C:\Python26\ArcGIS10.0\python.exe", r"\\tgapparc01\c$\ReplaceSource.py", full_path]
proc = subprocess.Popen(cmd)

MichaelVolz · ‎02-07-2012

Thanks Chris

That was the syntax that was needed to call the subprocess python script in Windows.

I could not find any other examples of that syntax when searching the web.

ChrisFrost · ‎02-07-2012

as an alternative to subprocess.Popen, you might consider using the multiprocessing.Pool class. With Popen you will need to manage the number of processes that are running at once. With Pool you can define the number of worker processes in the Pool (#cores/processors -1) and it will keep that number of worker processes running.

MichaelVolz · ‎02-07-2012

Chris:

Why do I need to worry about the number of subprocesses? I wrote my main python script to loop through a directory, its subdirectories and files looking for mxds. Then I call the subprocess python script for the file that has focus and I manipulate that mxd. Once that is done and the mxd gets saved, I thought the code goes back to the main python script and releases the subprocess python script from memory. Then I loop to the next mxd and call the subprocess python script again.

The reason I went this route in the first place was due to memory usage always climbing while looping through a directory until the python script just hung. Could it be that I just needed to release additional items from temporary memory when I was just using one script to obtain the results?

Thanks.

ChrisFrost · ‎02-08-2012

Chris:

Why do I need to worry about the number of subprocesses? I wrote my main python script to loop through a directory, its subdirectories and files looking for mxds. Then I call the subprocess python script for the file that has focus and I manipulate that mxd. Once that is done and the mxd gets saved, I thought the code goes back to the main python script and releases the subprocess python script from memory. Then I loop to the next mxd and call the subprocess python script again.

The reason I went this route in the first place was due to memory usage always climbing while looping through a directory until the python script just hung. Could it be that I just needed to release additional items from temporary memory when I was just using one script to obtain the results?

Thanks.

You need to worry about the number of processes because the script that fires off Popen does not wait for Popen to finsh. Therefore you will likely end up with many processes running concurrently (more than the number of cores/processors), saturating the cpu and causing the machine to thrash.

It's possible that your original script is written in a manner that is causing the high memory usage because temp data is not released for garbage collection. Post the script if you like and i'll take a look.

MichaelVolz · ‎02-08-2012

Thanks Chris. The script with the subprocess did exactly what you said it would and flooded the processor with parallel processes.

Here is my code with just one python script where the memory usage keeps increasing until the python scripts freezes up. The script is meant to loop through a selected directory, get the mxds and resource the SDE layers from the original SDE connection to a different SDE connection. You can open up Task Manager while running this script and see how memory usage for the pythonw.exe process keeps increasing. Please note that you will need to run this script on a directory with a good number of mxd files that have many SDE layers in order to see the performance hit.

import os, sys, string, arcpy

mxd_match = ".mxd"

Directory_Search = r"\\server00\e$\restore5\Experiment\Test_10"
new_connPrefix = r"C:\Documents and Settings\Application Data\ESRI\Desktop10.0\ArcCatalog"

def Conn_Info(usr):
    if usr == "user01":
        new_connection = new_connPrefix + "\\" + usr + "_dir_conn@production.sde"
        return new_connection
    elif usr == "user02":
        new_connection = new_connPrefix + "\\" + usr + "_dir_conn@development.sde"
        return new_connection
    elif usr == "user03":
        new_connection = new_connPrefix + "\\" + usr + "_dir_conn@production.sde"
        return new_connection
    del new_connection
        
for root, dirs, files in os.walk(Directory_Search):
# for root, dirs, files in os.walk(arcpy.GetParameterAsText(0)):

    fListLength = len(files)

    if (fListLength != 0):
        n = 0
        for f in files:

            full_path = root + "\\" + str(f)

            if f.endswith(mxd_match):
                
                mxd = arcpy.mapping.MapDocument(full_path)

                for lyr in arcpy.mapping.ListLayers(mxd):
                    try:
                        if lyr.supports("DATASOURCE"):
                            
                            if lyr.supports("SERVICEPROPERTIES"):
                                servProp = lyr.serviceProperties
                                user = str(servProp.get('UserName', 'N/A'))
                                
                                new_conn = Conn_Info(user)
                                
                                lyr.replaceDataSource(new_conn, "SDE_WORKSPACE", lyr.datasetName)


                    except:
                        print arcpy.GetMessages(2)
                    del lyr
                    
                try:        
                    # mxd.saveACopy(root + "\\" + f[:-4] + "_New.mxd", "9.3")
                    mxd.save()
                    
                except:
                    print arcpy.GetMessages(2)

                del mxd
                
        del full_path

Thanks for all your help!!

ChrisFrost · ‎02-08-2012

the only thing suspect i can see is the arcpy.mapping.ListLayers(mxd) value that might not get released. maybe try setting it to a variable and then deleting the variable when done.

if f.endswith(mxd_match):
                
                mxd = arcpy.mapping.MapDocument(full_path)
                lyrs = arcpy.mapping.ListLayers(mxd)
                for lyr in lyrs:
                    try:
                .....
                del lyrs

MichaelVolz · ‎02-08-2012

Chris:

I tried your approach to set lyrs equal to a variable and then delete the lyrs variable after using it in each mxd, but it did not help the memory issue. I also tried to delete the servProp variable and that did not help the memory issue either.