Select to view content in your preferred language

Pool from multiprocessing issues

8367
14
07-26-2011 04:48 PM
BurtMcAlpine
Regular Contributor
I am trying to use pool from the multiprocessing to speed up some operations def worker(d) that happen once for each leayer in the mxd.  this on is hard coded to D:\TEMP\Untitled4.mxd.   it runs but only one at a time.  I can see it start the pool, but only on is being used.  any help would be great.  I am running it in arctool box in ArcMap and have unchecked run as process. like I said it runs, but only one at a time....

import arcpy
import os
import multiprocessing

def worker(d):
    # buffer layer by the below values
    bfs = [101, 200, 201, 400, 401, 750, 751, 1001,
           1500, 1501, 2000, 2001, 2500]
    for bf in bfs:
        Output = os.path.basename(d)[:-4] + "_Buffer" + str(bf) + ".shp"
        print "Buffering " + d + " at " + str(bf) + " Feet"
        if arcpy.Exists(d):
            arcpy.Buffer_analysis(d, "D:\\Temp\\" + Output, str(bf) + " Feet")
            arcpy.Project_management("D:\\Temp\\" + Output, "D:\\Temp\\Test\\" + Output, "C:\Program Files (x86)\ArcGIS\Desktop10.0\Coordinate Systems\Geographic Coordinate Systems\North America\NAD 1983.prj")
            arcpy.Delete_management("D:\\Temp\\" + Output)
        else:
            print "No Data"


if __name__ == '__main__':
   
   #Sets MXD
    mxd = arcpy.mapping.MapDocument("D:\TEMP\Untitled4.mxd")
    #mxd = arcpy.mapping.MapDocument("CURRENT")

    #set some environments needed to get the correct outputs
    arcpy.env.overwriteOutput = True
    arcpy.env.workspace  = "D:\TEMP\Test"
    arcpy.env.outputCoordinateSystem = "C:\Program Files (x86)\ArcGIS\Desktop10.0\Coordinate Systems\Projected Coordinate Systems\UTM\NAD 1983\NAD 1983 UTM Zone 16N.prj"

    # of processors to use set for max minus 1
    prc = int(os.environ["NUMBER_OF_PROCESSORS"]) - 1

    # Create and start a pool of worker processes
    pool = multiprocessing.Pool(prc)

    # Gets all layer in the Current MXD
    lyrs = arcpy.mapping.ListLayers(mxd)

    #Loops through every layer and gets source data name and path
    for lyr in lyrs:
        d = lyr.dataSource
        print "Passing " + d + " to processing pool"
        arcpy.AddMessage("Passing " + d + " to processing pool")
        pool.apply_async(worker(d))
Tags (2)
0 Kudos
14 Replies
ChrisSnyder
Honored Contributor
Large Address Aware: On a second look maybe there is another flag you can set just for Python that would make it so... Looked to me like it just wasn't linked to the global flag... That said, i don't have my stuff set up for it, but just have heard of others using it, so I was curious what others had experienced. I've never really seen exact ESRI-specific "how to" instructions for making all things ESRI GIS large address aware. Among other things, I would find the addded memory limits very usefull for the in_memory workspace as well as numpy-array type things.
0 Kudos
StacyRendall1
Frequent Contributor
Burt, I found out why your Multiprocessing code didn't work: the line passing the worker to the pool was incorrect (sorry I didn't pick up on it earlier!). You had:
  # add worker to job server
  jobs.append(pool.apply_async(worker(d)))


It should be:
  # add worker to job server
  jobs.append(pool.apply_async(worker, (d,)))


The function and the arguments are passed seperately. The comma after the d is because the arguments should be as a Tuple (if there is more than one you don't need a trailing comma)... It is a little wierd that it worked at all!

Chris, in answer to your questions:
1)  Yes, they are individual processes in the task manager. If you start a pool with six workers, and all of them are being utilised, there will be seven instances of python.exe in the task manager - the six children and the parent process.
2)  I had never heard of Large Address Aware. Yes, I was running that on Windows 7 64-bit. Interestingly I have had a Python 64-bit code use more than 10GB of memory at one stage (not arcpy related though) - would be nice to use that much with Arcpy!
0 Kudos
BurtMcAlpine
Regular Contributor
Stacy,

Thanks so much.  of course I feel like a moron now.  I had the same issue with tht PP but it threw and error when it was not a tuple.  I will try that out and change my code to use mp, as it seems more arcpy friendly.

Chris,

You can make python 32-bit have 4gb of memory on a 64-bit OS.  You need to change the flag in the python.exe.  I found a tool to change it for me the link is below.  I found it in another forum and I have not tested it so use it with caution... http://www.techpowerup.com/forums/attachment.php?attachmentid=34392&d=1269231650.

FYI ARCGIS is set to Largeaddressaware by default
0 Kudos
V_StuartFoote
MVP Alum
Burt,

Large Address Aware.exe, v2.0.4 by Lee Glasser--aka FordGT90Concept is an excellent find! Thank you for posting the download link, but for anyone interested here is the techPowerUP! Forum thread link http://www.techpowerup.com/forums/showthread.php?t=112556 where version release and configuration details are kept in the first posting.

I've modded a few executables using "EDITBIN.EXE /LARGEADDRESSAWARE" and "DUMPBIN.EXE" that unfortunately are only available from a Visual Studio install. But this .NET applet offering from the gaming world allows convenient bitwise toggle post install for any executable. It also provides enough of a GUI to be able to manage the settings should stability issues arise.

Toggling the LaregAddressAware bit on 32-bit Windows executables, like python.exe and pythonw.exe, allows full 4GB addressing of each instance under 64-bit OS. And when the OS /3G boot flag is set on 32-bit OS with a full 4Gb of physical RAM, setting the LargeAddressAware flag provides an extra ~1.2 Gb of address space for the instance to expand into.

With such a convenient tool the question now becomes --which 32-bit executables to mod and of keeping track of any interdependencies. When to mod--and when not.

Stuart
0 Kudos
deleted-user-4RbHy6ryQ4a8
Deactivated User
I you are wary of using that third party utility, here are detailed instructions for using editbin.exe from Visual Studio 2010:
http://gisgeek.blogspot.com/2012/01/set-32bit-executable-largeaddressaware.html

This page shows that editbin.exe comes with Visual C++ Express 2010, so there should be a free do-it-yourself option for the security conscious:
http://msdn.microsoft.com/en-us/library/hs24szh9.aspx
0 Kudos