Select to view content in your preferred language

Use Subprocess to Alleviate Memory Usage Issues

6515
17
02-07-2012 04:40 AM
MichaelVolz
Esteemed Contributor
To All Python Users:

I have a script that is updating mxds and then saving them.  After saving the mxd I am deleting the mxd object, but memory usage keeps increasing until the python script just hangs.

I have read that I can use a subprocess to fix this problem, but I am unsure of the syntax to use.  I will be drilling down through a directory and all of its subdirectories for mxd files (This will be my main python script).  Once I have an mxd file as the focus, I want to use a subprocess to start another python script that will open the mxd, replace data sources, save the mxd and then delete the mxd object from memory.  Then I will return to the main script to get another file which should free up memory as the subprocess has terminated.

Does anyone know the syntax I would use to call the secondary python script with an argument being the mxd file that has the focus?  Any help or hints are greatly appreciated.  Thank you.
Tags (2)
0 Kudos
17 Replies
ChrisFrost
Emerging Contributor
It seems that the syntax I was using only works on linux, not windows.

for windows,

cmd = [r"C:\Python26\ArcGIS10.0\python.exe", r"\\tgapparc01\c$\ReplaceSource.py", full_path]
proc = subprocess.Popen(cmd)
0 Kudos
MichaelVolz
Esteemed Contributor
Thanks Chris

That was the syntax that was needed to call the subprocess python script in Windows.

I could not find any other examples of that syntax when searching the web.
0 Kudos
ChrisFrost
Emerging Contributor
as an alternative to subprocess.Popen, you might consider using the multiprocessing.Pool class.  With Popen you will need to manage the number of processes that are running at once.  With Pool you can define the number of worker processes in the Pool (#cores/processors -1) and it will keep that number of worker processes running.
0 Kudos
MichaelVolz
Esteemed Contributor
Chris:

Why do I need to worry about the number of subprocesses?  I wrote my main python script to loop through a directory, its subdirectories and files looking for mxds.  Then I call the subprocess python script for the file that has focus and I manipulate that mxd.  Once that is done and the mxd gets saved, I thought the code goes back to the main python script and releases the subprocess python script from memory.  Then I loop to the next mxd and call the subprocess python script again.

The reason I went this route in the first place was due to memory usage always climbing while looping through a directory until the python script just hung.  Could it be that I just needed to release additional items from temporary memory when I was just using one script to obtain the results?

Thanks.
0 Kudos
ChrisFrost
Emerging Contributor
Chris:

Why do I need to worry about the number of subprocesses?  I wrote my main python script to loop through a directory, its subdirectories and files looking for mxds.  Then I call the subprocess python script for the file that has focus and I manipulate that mxd.  Once that is done and the mxd gets saved, I thought the code goes back to the main python script and releases the subprocess python script from memory.  Then I loop to the next mxd and call the subprocess python script again.

The reason I went this route in the first place was due to memory usage always climbing while looping through a directory until the python script just hung.  Could it be that I just needed to release additional items from temporary memory when I was just using one script to obtain the results?

Thanks.


You need to worry about the number of processes because the script that fires off Popen does not wait for Popen to finsh.  Therefore you will likely end up with many processes running concurrently (more than the number of cores/processors), saturating the cpu and causing the machine to thrash.

It's possible that your original script is written in a manner that is causing the high memory usage because temp data is not released for garbage collection.  Post the script if you like and i'll take a look.
0 Kudos
MichaelVolz
Esteemed Contributor
Thanks Chris.  The script with the subprocess did exactly what you said it would and flooded the processor with parallel processes.

Here is my code with just one python script where the memory usage keeps increasing until the python scripts freezes up.  The script is meant to loop through a selected directory, get the mxds and resource the SDE layers from the original SDE connection to a different SDE connection.  You can open up Task Manager while running this script and see how memory usage for the pythonw.exe process keeps increasing.  Please note that you will need to run this script on a directory with a good number of mxd files that have many SDE layers in order to see the performance hit.

import os, sys, string, arcpy

mxd_match = ".mxd"

Directory_Search = r"\\server00\e$\restore5\Experiment\Test_10"
new_connPrefix = r"C:\Documents and Settings\Application Data\ESRI\Desktop10.0\ArcCatalog"

def Conn_Info(usr):
    if usr == "user01":
        new_connection = new_connPrefix + "\\" + usr + "_dir_conn@production.sde"
        return new_connection
    elif usr == "user02":
        new_connection = new_connPrefix + "\\" + usr + "_dir_conn@development.sde"
        return new_connection
    elif usr == "user03":
        new_connection = new_connPrefix + "\\" + usr + "_dir_conn@production.sde"
        return new_connection
    del new_connection
        
for root, dirs, files in os.walk(Directory_Search):
# for root, dirs, files in os.walk(arcpy.GetParameterAsText(0)):

    fListLength = len(files)

    if (fListLength != 0):
        n = 0
        for f in files:

            full_path = root + "\\" + str(f)

            if f.endswith(mxd_match):
                
                mxd = arcpy.mapping.MapDocument(full_path)

                for lyr in arcpy.mapping.ListLayers(mxd):
                    try:
                        if lyr.supports("DATASOURCE"):
                            
                            if lyr.supports("SERVICEPROPERTIES"):
                                servProp = lyr.serviceProperties
                                user = str(servProp.get('UserName', 'N/A'))
                                
                                new_conn = Conn_Info(user)
                                
                                lyr.replaceDataSource(new_conn, "SDE_WORKSPACE", lyr.datasetName)


                    except:
                        print arcpy.GetMessages(2)
                    del lyr
                    
                try:        
                    # mxd.saveACopy(root + "\\" + f[:-4] + "_New.mxd", "9.3")
                    mxd.save()
                    
                except:
                    print arcpy.GetMessages(2)

                del mxd
                
        del full_path


Thanks for all your help!!
0 Kudos
ChrisFrost
Emerging Contributor
the only thing suspect i can see is the arcpy.mapping.ListLayers(mxd) value that might not get released. maybe try setting it to a variable and then deleting the variable when done.

if f.endswith(mxd_match):
                
                mxd = arcpy.mapping.MapDocument(full_path)
                lyrs = arcpy.mapping.ListLayers(mxd)
                for lyr in lyrs:
                    try:
                .....
                del lyrs
0 Kudos
MichaelVolz
Esteemed Contributor
Chris:

I tried your approach to set lyrs equal to a variable and then delete the lyrs variable after using it in each mxd, but it did not help the memory issue.  I also tried to delete the servProp variable and that did not help the memory issue either.
0 Kudos