Create a script tool that uses multiprocessing

11704
22
04-02-2015 04:32 AM

Create a script tool that uses multiprocessing

I had created a python script that used the python multiprocessing module to take advantage of a multi-core computer. This was created in ArcMap 10.3. It ran fine in IDLE but when I attempted to wire it into a Script Tool interface so I could expose it as a Tool in ArcToolbox I started to have problems...

With some great help from the community on GIS SE I was able to finally get it working, the solution was rather obscure so I am documenting it here for others. As you will see this was doing nothing amazing but it does provide a template to get you up and going and hopefully you won't waste time as I did trying to work out why it was not working...

Scenario

I have a polyline dataset that I want to clip to a polygon dataset. Due to the nature of the polygon dataset the clip tool was taking a long time to process. So I decided to use multiprocessing to speed things up.

Solution

My script that was wired up to a Script Tool interface is very simple and is shown below. Note it imports another module I called multicode. This python file sits in the same folder as this script.

'''

    Title:

        multiexample

    Description:

        This simple script is the script that is wired up into toolbox

'''

import arcpy

import multicode

# Get parameters (These are Layer objects)

clipper = arcpy.GetParameter(0)

tobeclipped = arcpy.GetParameter(1)

def main():

    arcpy.AddMessage("Calling multiprocessing code...")

    multicode.multi(clipper,tobeclipped)

if __name__ == '__main__':

    main()

Important: When wiring up this script to an interface make sure Run Python script in process is UNTICKED under the source tab!

My main multiprocessing code is shown below, take note of the limitations. I've tried to document it so that it is understandable.

"""

    Title:

        multicode

    Description:

        The module that does multicore clipping. It will take as input a polygon layer and another layer. For

        each polygon it will clip the dataset into a new separate shapefile.

    Limitations:

        This code expects the folder c:\temp\tc to exist, this is where the output ends up. As geoprocessing objects

        cannot be "pickled" the full path to the dataset is passed to the worker function. This means that any selection

        on the input clipper layer is ignored.

    Author:

        Duncan Hornby (ddh@geodata.soton.ac.uk)

    Created:

        2/4/15

"""

import os,sys

import arcpy

import multiprocessing

from functools import partial

def doWork(clipper,tobeclipped,oid):

    """

        Title:

            doWork

        Description:

            This is the function that gets called as does the work. The parameter oid comes from the idList when the

            function is mapped by pool.map(func,idList) in the multi function.

            Note that this function does not try to write to arcpy.AddMessage() as nothing is ever displayed.

            If the clip succeeds then it returns TRUE else FALSE.

    """

    try:

        # Each clipper layer needs a unique name, so use oid

        arcpy.MakeFeatureLayer_management(clipper,"clipper_" + str(oid))

        # Select the polygon in the layer, this means the clip tool will use only that polygon

        descObj = arcpy.Describe(clipper)

        field = descObj.OIDFieldName

        df = arcpy.AddFieldDelimiters(clipper,field)

        query = df + " = " + str(oid)

        arcpy.SelectLayerByAttribute_management("clipper_" + str(oid),"NEW_SELECTION",query)

        # Do the clip

        outFC = r"c:\temp\tc\clip_" + str(oid) + ".shp"

        arcpy.Clip_analysis(tobeclipped,"clipper_" + str(oid),outFC)

        return True

    except:

        # Some error occurred so return False

        return False

def multi(clipper,tobeclipped):

    try:

        arcpy.env.overwriteOutput = True

        # Create a list of object IDs for clipper polygons

        arcpy.AddMessage("Creating Polygon OID list...")

        descObj = arcpy.Describe(clipper)

        field = descObj.OIDFieldName

        idList = []

        with arcpy.da.SearchCursor(clipper,[field]) as cursor:

            for row in cursor:

                id = row[0]

                idList.append(id)

        arcpy.AddMessage("There are " + str(len(idList)) + " object IDs (polygons) to process.")

        # Call doWork function, this function is called as many OIDS in idList

        # This line creates a "pointer" to the real function but its a nifty way for declaring parameters.

        # Note the layer objects are passing their full path as layer objects cannot be pickled

        func = partial(doWork,clipper.dataSource,tobeclipped.dataSource)

        arcpy.AddMessage("Sending to pool")

        # declare number of cores to use, use 1 less than the max

        cpuNum = multiprocessing.cpu_count() - 1

        # Create the pool object

        pool = multiprocessing.Pool(processes=cpuNum)

        # Fire off list to worker function.

        # res is a list that is created with what ever the worker function is returning

        res = pool.map(func,idList)

        pool.close()

        pool.join()

        # If an error has occurred report it

        if False in res:

            arcpy.AddError("A worker failed!")

        arcpy.AddMessage("Finished multiprocessing!")

    except arcpy.ExecuteError:

        # Geoprocessor threw an error

        arcpy.AddError(arcpy.GetMessages(2))

    except Exception as e:

        # Capture all other errors

        arcpy.AddError(str(e))

When I ran the tool from ArcToolbox it immediately bombed out with a 000714 error, far too quickly for it to be a silly syntax error by me. Again GIS SE came to the rescue and it turned out to be an issue with which version of Python that was being used. If you open the python command line window in ArcMap and type the following:

import sys

sys.version

You will see '2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]' which tells you that it is using the 32 bit version. Running my script from Arctoolbox turns out to be running the 64 bit version which I assume was installed when I installed the 64 bit background geo-processing. This was upsetting ArcToolbox.

So how did I solve this? I went to my py file in file explorer, right clicked and went to open with > Choose default programs and browsed to C:\Python27\ArcGIS10.3\pythonw.exe. Once the default application to open python files was reset to the 32 bit version the Script Tool ran without error and I could see all the cores on my machine max out.

Comments

This is the first thing I've seen about multithreaded processing with Python. Thanks for the post!

Great example, thanks!

Thank you for this great example. Without your post I would never had success with my multiprocessing script.

I recommend having a read of my GIS Stack Exchange answer​ that Duncan Hornby​ links to as it covers a few more gotchas/tips.

Hi

If you have access to an instance of ArcGIS for Server and ArcGIS Pro then there is another paradigm available, to leverage the cores on a server and achieve concurrency with the concurrent.futures module in Python 3.

Here is an example:

http://www.arcgis.com/home/item.html?id=e57a7f9b15de428ea33c62e00bfb2332

For the scenario in question, the GeometryServer endpoints could be used, like Difference.

Another way to leverage cores is with subprocesses, here is an example which calls 64bit ArcPy:

http://arcgis.com/home/item.html?id=b3c7c6273ef54e91aa57a073aa873eca 

If you are going to use multiprocessing and pool.map you have a blocking situation anyway, so you may as well spin up subprocesses.

Multiprocessing on the client side is hard, with just a little practice with handling JSON and web calls all the pain goes away and you can get linear scaling with server capacity.

Regards

You will see '2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]' which tells you that it is using the 32 bit version. Running my script from Arctoolbox turns out to be running the 64 bit version which I assume was installed when I installed the 64 bit background geo-processing. This was upsetting ArcToolbox.

Yes -- the final association for .py is set up to be the last Python installed - which is x64 Python 2.7 if you installed ArcGIS x64 geoprocessing, and if you have installed Pro's Python after that, it's x64 Python 3.4!

Is there a way to solve this problem by associating .py with Launcher and putting #! text at the top of our scripts?

I finally got this working today, with encouragement and guidance from that Stack Exchange script. Thanks so much Duncan Hornby​ and Luke Pinner​!

According to the docs, 3. Using Python on Windows — Python 3.5.1 documentation yes... and virtual environments support came (are coming in the Arc* case) to Python 3.5

Curtis,

I had to go away and find out what #! meant, initially I thought it was a typo but apparently it is a shebang line​.

I was wondering if you could share a code snippet of what you had succeeded with? From your question below I guess you are trying to make the script say "run me using this version of python"? It would be really helpful to see how you did it. May be I should add it to my "template" above?

Duncan

I have not done this yet. The idea is that .py scripts get associated with the Python launcher app. Then when the .py file is launched with a double click or with cmd /c <file.py> (what "start script in process unchecked in ArcMap does) the #! line is used to launch the script with the correct Python. 

I don't know if this works either with 10.4 (Python 2.7.8) or Pro 1.2 (Python 3.4.3 x64).  Esri included a link in the help (currently broken) for this Launcher thing - which looks like it may be possible -- but provided no direct guidance.

read the link I posted, it includes a full discussion on python in windows, including the python launcher, virtual environments, and the shebang lines.  It is for python 3.5.  More details from my recent post, it is going to be reverse chronological starting with python 3.6 and moving backwards.

Coming to Python.... preparation and anticipation

yes, following that link and going up and down in the python versions,  python launcher was new in version 3.3 (from 3.4.4 doc). In 3.5, 2 versions of the installers became available for each of 32bit and 64 bit. and there were some changes in 3.5 regarding finding the python executable (3.3.2) virtual environments covered in 3.4.1.2.  Also the alternate bundles increase in version 3.5 and now includes Anaconda which there was some recent talk about.

@DuncanHornby @curtvprice 

I'm trying to use multiprocessing to create service areas for many features.  Below is my code, but I keep receiving an error when it tries to create the service area analysis layer:

Creating OID list
There are 2 facilities to process
Sending to pool
Creating service area
Error creating service area: ERROR 160706: Cannot acquire a lock.
Failed to execute (MakeServiceAreaAnalysisLayer).

 

import arcpy, os, time, sys
import multiprocessing
from functools import partial
arcpy.env.overwriteOutput = 1

# Variables
gdb = r"C:\Projects\TCHP\TCHP.gdb"
networkDataset = r"C:\Projects\TCHP\TCHP.gdb\Transportation\Streets_ND"
facilities = r"C:\Projects\TCHP\TCHP.gdb\Facilities"
output_dir = r"C:\Projects\TCHP"
layer_name = "FacilityCoverage"


def createServiceAreas(facilities,oid):
    try:
        # Create a feature layer for each facility
        descObj = arcpy.Describe(facilities)
        field = descObj.OIDFieldName
        df = arcpy.AddFieldDelimiters(facilities,field)
        query = df + " = " + str(oid)
        arcpy.MakeFeatureLayer_management(facilities,"facilities_{0}".format(oid), query)

        # Create Service Area Layer
        try:
            print("Creating service area")
            # Create unique service area name
            serviceArea = "ServiceArea_{0}".format(oid)
            result_object = arcpy.na.MakeServiceAreaAnalysisLayer(networkDataset, serviceArea, "Driving Time", "FROM_FACILITIES", [4], None, "LOCAL_TIME_AT_LOCATIONS", "POLYGONS", "GENERALIZED", "OVERLAP", "RINGS", "100 Meters", None, None)
            layer_object = result_object.getOutput(0)
        except Exception as e:
               print("Error creating service area:  {0}".format(e))
               sys.exit()

        # Add Locations
        try:
            print("Adding locations")
            arcpy.na.AddLocations(serviceArea, "Facilities", "facilities_{0}".format(oid), "Name Name #;CurbApproach # 0;Attr_Minutes # 0;Attr_Meters # 0;Attr_WeekdayFallbackTravelTime # 0;Attr_WeekendFallbackTravelTime # 0;Attr_TravelTime # 0;Breaks_Minutes # #;Breaks_Meters # #;Breaks_WeekdayFallbackTravelTime # #;Breaks_WeekendFallbackTravelTime # #;Breaks_TravelTime # #", "5000 Meters", None, "Streets SHAPE;Streets_ND_Junctions NONE", "MATCH_TO_CLOSEST", "APPEND", "NO_SNAP", "5 Meters", "EXCLUDE", None)
        except Exception as e:
               print("Error adding locations: {0}".format(e))

        # Solve
        try:
            print("Solving")
            arcpy.na.Solve(serviceArea)
        except Exception as e:
            print("Error Solving: {0}".format(e))

        # Get Service Area polygon layer
        sublayer_names = arcpy.na.GetNAClassNames(layer_object)
        serviceAreaPolygon = sublayer_names["SAPolygons"]

        # Export Service Area polygon layer to feature class
        arcpy.FeatureClassToFeatureClass_conversion(serviceAreaPolygon, gdb, serviceArea)

        return True
    except:
        # Some error occurred so return False
        return False


def main():
    startTime = time.clock()

    # Create a list of object IDs for facilities
    print("Creating OID list")
    descObj = arcpy.Describe(facilities)
    field = descObj.OIDFieldName
    idList = []
    with arcpy.da.SearchCursor(facilities,[field]) as cursor:
        for row in cursor:
            id = row[0]
            idList.append(id)
    print("There are {0} facilities to process".format(str(len(idList))))

    # Use partial to declare the variables for the createServiceAreas function
    func = partial(createServiceAreas,facilities)

    print("Sending to pool")
    # declare number of cores to use, use 1 less than the max
    cpuNum = multiprocessing.cpu_count() - 1

    # Create the pool object
    pool = multiprocessing.Pool(processes=cpuNum)

    # Fire off list to worker function.
    # res is a list that is created with what ever the worker function is returning
    res = pool.map(func,idList)
    pool.close()
    pool.join()

    endTime = time.clock()
    elapsedTime = (endTime - startTime) / 60
    print("Elapsed Time:  " + str(elapsedTime))

if __name__ == '__main__':
    main()

 

Do either of you see where I may be going wrong?

 

@JakeSkinner  try checking out/in the na extension within your worker function createServiceAreas? I find I need to do that with the spatial analyst, so may be same issue with na extension?

@DuncanHornbyI still receive an error when checking out the extension.  Sometimes I receive the following error as well:

Error creating service area: ERROR 160722: The item was not found.

If I only have 1 point in the facilities feature class, the code executes successfully.  I thought there may be a lock on the Network Dataset, so I tried creating a unique Network Dataset Layer in the createServiceAreas function and creating the Service Area Analysis Layer using that:

arcpy.MakeNetworkDatasetLayer_na(networkDataset, "networkDataset_{0}".format(oid))

 

However, I still received an error.

@JakeSkinner  As I read your code your facility layer only ever has one point in it, based upon this line

arcpy.MakeFeatureLayer_management(facilities,"facilities_{0}".format(oid), query)

When would it ever have more?  Or are you saying the if the Feature Class just has 1 point in it?

@DuncanHornby  I was referring to the feature class only having 1 point.  Here is the data and script if you wanted to take a look.

@JakeSkinner  I did some micro tweaking of your code, which is below. I had to run it in Spyder using ArcPro arcpy as I do not have access to Network Analyst for ArcMap at home. It worked as shown below.

DuncanHornby_0-1614986791446.png

 

The tweaks are I moved your checkout/in into the worker function, which I had suggested, you had left it in the main function. Note I check in under finally.

 

 

def createServiceAreas(facilities,oid):
    try:
        # Checkout Extension
        arcpy.CheckOutExtension("Network")
    
        # Create a feature layer for each facility
        descObj = arcpy.Describe(facilities)
        field = descObj.OIDFieldName
        df = arcpy.AddFieldDelimiters(facilities,field)
        query = df + " = " + str(oid)
        arcpy.MakeFeatureLayer_management(facilities,"facilities_{0}".format(oid), query)

        # Create Service Area Layer
        try:
            print("Creating service area")
            # Create unique service area name
            serviceArea = "ServiceArea_{0}".format(oid)
            result_object = arcpy.na.MakeServiceAreaAnalysisLayer(networkDataset, serviceArea, "Driving Time", "FROM_FACILITIES", [4], None, "LOCAL_TIME_AT_LOCATIONS", "POLYGONS", "GENERALIZED", "OVERLAP", "RINGS", "100 Meters", None, None)
            layer_object = result_object.getOutput(0)
        except Exception as e:
               print("Error creating service area:  {0}".format(e))
               sys.exit()

        # Add Locations
        try:
            print("Adding locations")
            arcpy.na.AddLocations(serviceArea, "Facilities", "facilities_{0}".format(oid), "Name Name #;CurbApproach # 0;Attr_Minutes # 0;Attr_Meters # 0;Attr_WeekdayFallbackTravelTime # 0;Attr_WeekendFallbackTravelTime # 0;Attr_TravelTime # 0;Breaks_Minutes # #;Breaks_Meters # #;Breaks_WeekdayFallbackTravelTime # #;Breaks_WeekendFallbackTravelTime # #;Breaks_TravelTime # #", "5000 Meters", None, "Streets SHAPE;Streets_ND_Junctions NONE", "MATCH_TO_CLOSEST", "APPEND", "NO_SNAP", "5 Meters", "EXCLUDE", None)
        except Exception as e:
               print("Error adding locations: {0}".format(e))

        # Solve
        try:
            print("Solving")
            arcpy.na.Solve(serviceArea)
        except Exception as e:
            print("Error Solving: {0}".format(e))

        # Get Service Area polygon layer
        sublayer_names = arcpy.na.GetNAClassNames(layer_object)
        serviceAreaPolygon = sublayer_names["SAPolygons"]

        # Export Service Area polygon layer to feature class
        arcpy.FeatureClassToFeatureClass_conversion(serviceAreaPolygon, gdb, serviceArea)

        return True
    except:
        # Some error occurred so return False
        return False
    finally:
        arcpy.CheckInExtension("Network")

 

 

 

The arcpro version of python didn't like your time.clock() in the main() so I changed that time.time(). That's all I did...

@DuncanHornby  thanks for getting back to me.  Even after making your changes, the script still produces errors.  I found out from a colleague that you cannot have multiple process write to the same File Geodatabase.  Are you writing the output to a File Geodatabase?

I was able to get the multiprocessing to work by creating new layer files, however, this did not seem to be more performant.  The multiprocessing took longer to perform the multiple solves, compared to sending multiple facilities and performing one solve.

@JakeSkinner  yes I used your code as is, with my micro tweaks as described above, so I was writing to your file geodatabase... It worked for me.

I too have discovered that multiprocessing is not always faster, I think there is a big hit on instantiating the geoprocessor, tool validation etc; so you only get a performance boost if what you are splitting out inherently has a long run time (e.g. munching some large raster dataset).

Weirdly only today I was looking at manifold.net ...

Version history
Revision #:
1 of 1
Last update:
‎04-02-2015 04:32 AM
Updated by:
 
Contributors