Passing objects into (and out of) a subprocess (Python)

11107
10
11-10-2010 02:52 AM
jamessample
New Contributor II
This is essentially a repeat of Chris Snyder's 2006 post on the old forum (http://forums.esri.com/Thread.asp?c=93&f=1729&t=193431&mc=0#msgid575153). It didn't get any replies back then; I'm hoping for better luck 🙂

Does anyone have any examples of using the subprocess module to pass objects into a slave script and then get the results back to the parent script?

I've seen some examples which pass file paths (as strings) into the slave script, but I really need to have the slave script manipulate numpy arrays declared in the parent script, then pass the results back out again (again as numpy arrays). Is this possible?

Thanks!
0 Kudos
10 Replies
ChrisMathers
Regular Contributor II
You should ask this on #python on irc.freenode.net. Thats where I take most of my pure python questions. Sharp bunch of folks there. Not that we are dull 😉

EDIT:
I asked in #python since I already had the chat open. Afrer reading your post they said:

You need some sort of serialization model to pass information among processes. They can't just manipulate the same memory. There's a chance James wants multiprocessing, twisted, numpy's serialization stuff, or a few other things.


That stuff is beyond me, but they seem to know about it there.
0 Kudos
jamessample
New Contributor II
Fantastic - thanks Chris!

I suspect that this is well beyond me too, but it's nice to know that it's possible if you know what you're doing. I'll do some research on "serialization models" and see how far I get.

Thanks for pointing out #python on irc.freenode.net too - looks very useful.

Cheers!
0 Kudos
BradPosthumus
Occasional Contributor II
Regarding serialization, they are likely referring to using pickle or marshall to save the numpy object to a file. The two scripts below show how this works.

  1. The parent.py script creates a simply numpy array and "pickles" (serializes) it to a file.

  2. The file path is passed to the child script to start a subprocess.

  3. The child script "unpickles" (de-serializes) the array, modifies it, then re-pickles it.

  4. The parent script unpickles the modified array and prints the result.


# parent.py

import numpy, pickle, os, subprocess, sys

colArray = numpy.array([1,2,3])
strOutputFile = os.path.join(os.getenv('TEMP'), "array.pkl")
pickle.dump(colArray, open(strOutputFile, 'wb'))

print colArray

strChildScript = r"C:\Temp\child.py"
intReturnCode = subprocess.call([os.path.join(sys.prefix, "python.exe"), strChildScript, strOutputFile])

colNewArray = pickle.load(open(strOutputFile, 'rb'))

print colNewArray


# child.py

import numpy, pickle, sys

strOutputFile = sys.argv[1]
colArray = pickle.load(open(strOutputFile, 'rb'))

colArray2 = numpy.array([4,5,6])
colArray = numpy.vstack((colArray, colArray2))

pickle.dump(colArray, open(strOutputFile, 'wb'))
0 Kudos
ChrisSnyder
Regular Contributor III
James,

Curious, but if you find a good method for doing this can you post it here?

I have struggled with finding a replacement for the 'os.P_NOWAIT' option of the os.spawnv method. I use os.P_NOWAIT (instead of os.P_WAIT) to launch multiple instances of slave scripts (that run in python.exe) to, in effect, create a pseudo method of parallel processing. Not being a "real" programmer, I simply rely on the child process writing out messages in a .txt file, and then have set up the master script/process to look for and read the message .txt file evry 10 seconds or so. Anyway, I'd love to update my stuff to use the subprocess module, but really haven't ever found any examples - specifically of launching a subprocess  (or many subprocesses) and having the master script not "wait" (e.g. sit there like a dummy) until the subprocess finished. I guess my os.spwanv system works pretty well for me, so I was kinda apathetic about putting to much time into finding a newer replacement. I'd imagine that os.spwanv will not be supported in Python 3.x...
0 Kudos
ChrisMathers
Regular Contributor II
Oh a pickle is a good idea. I wasnt thinking of serialization=pickles. Mental jam I guess. That would work easily actually 😛

Chris, I use subprocess.Popen() not os.popen. Subprocess is meant for opening a process and os really isnt. This is a task manager I wrote for our server (once we go to 10 I will be able to do it all from python2.6, but notice some needs to be launced by 2.5). The first script to run has a clause in it of sys.exit(0) if something dowesnt quality and returns a 0 value to .wait() when that script closes. You can use this to make your code continue after the subprocess runs. You can use .communicate instead of .wait to read out the stdout from the child if you need to know where in the other script you are. Oh and yes spawn is going away according to http://docs.python.org/release/2.6.6/library/subprocess.html

from time import localtime
from subprocess import Popen
 
result=Popen([r'C:\Python26\python.exe', r'C:\GIS Projects\CRW\FTPhalf.py']).wait()
if result == 0:
    pass
else:
    Popen([r'C:\Python25\python.exe', r'C:\GIS Projects\CRW\GPhalf1.1.py']).wait()
    Popen([r'C:\Python25\python.exe', r'C:\GIS Projects\TaxChange\taxchange.py']).wait()
 
if localtime()[2] == 1:
    subprocess.Popen([r'C:\Python25\python.exe', r'C:\GIS Projects\GIS Projects\Parcels.gdb\parcelupdate.py']).wait()


Your script could include:

result=subprocess.Popen('([r'C:\Python25\python.exe', your script]').wait() #in the other script, at the end make it do sys.exit(1) if you compelete successfully
if result==1:
    pass
else:
    print error info here
    sys.exit()
0 Kudos
BradPosthumus
Occasional Contributor II
To prevent the parent script from waiting for the child script (i.e. os.P_NOWAIT), use subprocess.Popen instead of subprocess.call:

subprocess.Popen([os.path.join(sys.prefix, "python.exe"), strChildScript, strOutputFile])


In fact, if you try this in the code above you'll find it re-reads the array before the child process can finish altering it.

If you use:

pid = subprocess.Popen([os.path.join(sys.prefix, "python.exe"), strChildScript, strOutputFile]).pid


...you get the process ID that can be used to track the process.

However it looks like os.spawnv is still part of Python 3.x, so if it's not broke....

In Python 2.6 there's also a new multiprocessing module that may be worth looking into, but I haven't tested it yet since I'm still mired in ArcGIS 9.3 with Python 2.5.
0 Kudos
ChrisSnyder
Regular Contributor III
Very nice - Thanks Chris and Brad!
0 Kudos
JasonScheirer
Regular Contributor II
I think you might do well by switching your NumPy array to use memmaps -- this minimizes the amount of IPC you're going to need to do and on Linux and Windows a memory-mapped file tends to be quite a bit more efficient than manually crafted sequential I/O with serialization/deserialization across the pipe.
0 Kudos
jamessample
New Contributor II
You guys are great - thanks to all! It's going to take a while for me to absorb all of the good info on this thread.

Using 'pickle' looks like a good option, and Brad's code snippet illustrating its use is very helpful. numpy.memmap sounds interesting too, but I'll have to go away and do some reading to decide if it's within my current (fairly meagre) coding abilities. Either way I reckon you've solved my problem 😄

In reply to Chris S: It looks like Brad and Chris M have answered your question about os.P_NOWAIT better than I ever could. If I do find anything else, I'll be sure to post back here.

Thanks again!
0 Kudos