Hi everyone,
Can anyone help with using the Multiprocessing.Pool() module to get my nested loop script running via multi-processing? I tried doing it myself but couldn't work out how without getting errors.
I am working with very big datasets which take hours to days to run at each stage of the loop so am in need of a way to make this significantly faster.
See script below:
#Import system modules
import arcpy
from arcpy import env
# define list values
list1 = [ 'x', 'y', 'z']
list2 = [ 'a', 'b', 'c']
for x in range(len(list1)):
for y in range(len(list2)):
try:
#Check out the Network Analyst extension license
arcpy.CheckOutExtension("Network")
#Set environment settings
env.overwriteOutput = True
#Set local variables
inNetworkDataset = "D:/RoadNetwork.gdb/RoadNetwork_ND"
#verify layer name
outNALayerName = list1[x]
chunk = list2[y]
#set variables
impedanceAttribute = "Distance"
inFacilities = "D:/facilities/" + chunk + ".shp"
polygonBarriers = "D:/barriers/" + outNALayerName + ".shp"
outLayerFile = "D:/outputs/" + outNALayerName + "_" + chunk + ".lyr"
#Make a barrier feature layer
barriersLayer = arcpy.management.MakeFeatureLayer(polygonBarriers,"PolygonBarriers").getOutput(0)
#Create a new service area layer.
outNALayer = arcpy.na.MakeServiceAreaLayer(inNetworkDataset, outNALayerName,
impedanceAttribute, "TRAVEL_FROM", "60", "DETAILED_POLYS", "NO_MERGE", "DISKS",
hierarchy = "NO_HIERARCHY", poly_trim_value = "100", restriction_attribute_name = ["OneWay"])
outNALayer = outNALayer.getOutput(0)
subLayerNames = arcpy.na.GetNAClassNames(outNALayer)
facilitiesLayerName = subLayerNames["Facilities"]
polygonbarriersLayerName = subLayerNames["PolygonBarriers"]
#Create field mappings for loading barriers
fieldMappings = arcpy.na.NAClassFieldMappings(outNALayer,polygonbarriersLayerName)
fieldMappings["BarrierType"].defaultValue = 0
#Load the facilities and barriers
arcpy.na.AddLocations(outNALayer, facilitiesLayerName, inFacilities, "", "")
arcpy.na.AddLocations(outNALayer, polygonbarriersLayerName, polygonBarriers, fieldMappings)
#Solve and save the service area layer
arcpy.na.Solve(outNALayer)
arcpy.management.SaveToLayerFile(outNALayer,outLayerFile,"ABSOLUTE")
except Exception as e:
# If an error occurred, print line number and error message
import traceback, sys
tb = sys.exc_info()[2]
now = datetime.now()
print now.strftime("%d%m%Y-%H%M%S")
print "An error occurred on line %i" % tb.tb_lineno
print str(e)
Solved! Go to Solution.
You should be able to get rid of your loops and let the python Multiprocessing Pool manage running the body of the loop. You have to prepare a list of "parameter sets" that the Pool will use to invoke the body whenever a processor becomes available. In your case the list would look something like this (no guarantee on the exact syntax)
parm_list = [('x','a'), ('x','b'),('x','c'),('y','a'),('y','b'),('y','c'),('z','a'),('z','b'),('z','c')]
then your body would be encapsulated in a function something list this
def my_body (parm):
outNALayerName = parm[0]
chunk = parm[1]
.....
and you would call it like this
p = multiprocessing.Pool(<number of processors>)
p.map(my_body, parm_list)
p.close()
You have to be careful about lock conflicts, for instance if you use duplicate names for your temporary files or try have multiple processes updating the same file.
I've taken the liberty to format your code so it's readable. While I don't have any experience with the Multiprocessing.Pool() module, there are a couple things you should probably consider in your code.
There are several functions that are repeated every time the loop(s) are iterated, and I wonder if this is what is causing you some problems. Perhaps you could include the errors.
For example, you only need to checkout Network Analyst once; you are checking it out everytime you go through the loop. You should be getting a fatal error everytime you encounter Line 34; makeFeatureLayer because that feature layer is created on the first pass through the loop(s) and already exists in subsequent trips through loops. What is going on in lines 24 & 25: I don't understand your use of index [x] and [y] in the respective list.
My suggestion is to clean up your code such that you are not repeating yourself every time you go through it and see what happens.
#Import system modules
import arcpy
from arcpy import env
# define list values
list1 = [ 'x', 'y', 'z']
list2 = [ 'a', 'b', 'c']
for x in range(len(list1)):
for y in range(len(list2)):
try:
#Check out the Network Analyst extension license
arcpy.CheckOutExtension("Network")
#Set environment settings
env.overwriteOutput = True
#Set local variables
inNetworkDataset = "D:/RoadNetwork.gdb/RoadNetwork_ND"
#verify layer name
outNALayerName = list1[x]
chunk = list2[y]
#set variables
impedanceAttribute = "Distance"
inFacilities = "D:/facilities/" + chunk + ".shp"
polygonBarriers = "D:/barriers/" + outNALayerName + ".shp"
outLayerFile = "D:/outputs/" + outNALayerName + "_" + chunk + ".lyr"
#Make a barrier feature layer
barriersLayer = arcpy.management.MakeFeatureLayer(polygonBarriers,"PolygonBarriers").getOutput(0)
#Create a new service area layer.
outNALayer = arcpy.na.MakeServiceAreaLayer(inNetworkDataset, outNALayerName,
impedanceAttribute, "TRAVEL_FROM", "60", "DETAILED_POLYS", "NO_MERGE", "DISKS",
hierarchy = "NO_HIERARCHY", poly_trim_value = "100", restriction_attribute_name = ["OneWay"])
outNALayer = outNALayer.getOutput(0)
subLayerNames = arcpy.na.GetNAClassNames(outNALayer)
facilitiesLayerName = subLayerNames["Facilities"]
polygonbarriersLayerName = subLayerNames["PolygonBarriers"]
#Create field mappings for loading barriers
fieldMappings = arcpy.na.NAClassFieldMappings(outNALayer,polygonbarriersLayerName)
fieldMappings["BarrierType"].defaultValue = 0
#Load the facilities and barriers
arcpy.na.AddLocations(outNALayer, facilitiesLayerName, inFacilities, "", "")
arcpy.na.AddLocations(outNALayer, polygonbarriersLayerName, polygonBarriers, fieldMappings)
#Solve and save the service area layer
arcpy.na.Solve(outNALayer)
arcpy.management.SaveToLayerFile(outNALayer,outLayerFile,"ABSOLUTE")
except Exception as e:
# If an error occurred, print line number and error message
import traceback, sys
tb = sys.exc_info()[2]
now = datetime.now()
print now.strftime("%d%m%Y-%H%M%S")
print "An error occurred on line %i" % tb.tb_lineno
print str(e)
You should be able to get rid of your loops and let the python Multiprocessing Pool manage running the body of the loop. You have to prepare a list of "parameter sets" that the Pool will use to invoke the body whenever a processor becomes available. In your case the list would look something like this (no guarantee on the exact syntax)
parm_list = [('x','a'), ('x','b'),('x','c'),('y','a'),('y','b'),('y','c'),('z','a'),('z','b'),('z','c')]
then your body would be encapsulated in a function something list this
def my_body (parm):
outNALayerName = parm[0]
chunk = parm[1]
.....
and you would call it like this
p = multiprocessing.Pool(<number of processors>)
p.map(my_body, parm_list)
p.close()
You have to be careful about lock conflicts, for instance if you use duplicate names for your temporary files or try have multiple processes updating the same file.