I'm writing a python toolbox implementing several different areal interpolation methods, from the most basic areal weighting up to methods that use some statistical techniques.
I'm planning on running some monte carlo simulations to assess the accuracy of each method with a particular set of input data, so I need to make the execution as quick as possible.
My current implementation for areal weighting is below. I've used a searchcursor nested within an updatecursor, which I'm led to believe is not the most efficient way.
Basically the task is to interpolate values from source zones (census polygons) to arbitrary target zone polygons, in this case based simply on the ratio of the area of each source zone that intersects a target zone to the total area of that source zone.
To do this, I loop through the target zones, select just the source zones that intersect the target zone, then loop through those source zones. Where the source zone is entirely contained within the target zone, I just take the total value from that source zone (I'm not sure if this optimization is worth it as all I save is the ratio calculation...), otherwise, I calculate the ratio and multiply it by the value to be interpolated.
I've been trying to think of a way to avoid this inner searchcursor by creating a dictionary outside the updatecursor. The difficulty is that this dictionary does not need to include all the source zones, just those that intersect the target zone. Ideally I'd have a dictionary with the ID of each target zone as the key, and the value would have to be another dictionary for each intersecting source zone with the interpolation and geometry fields as keys and the values of those fields as values. But it seems like to build this dictionary would require running an intersect or spatial join first, or another nested loop, so I'm unsure whether I would really gain much speed this way.
I know I can use cProfile to see exactly how long each part is taking, but it's not clear how best to set it up within a python toolbox - can anyone provide some examples of how best to do this?
Anyway, here is the execute code:
def execute(self, parameters, messages):
"""The source code of the tool."""
# get parameters
inSourceFeatures = parameters[0].valueAsText
inInterpolationFields = parameters[1].valueAsText.split(";")
inTargetFeatures = parameters[2].valueAsText
outTargetFeatures = parameters[3].valueAsText
# use in_memory workspace for quicker results
arcpy.env.workspace = 'in_memory'
# make feature layers of the source zone feature class
arcpy.MakeFeatureLayer_management(inSourceFeatures, "source_zones")
# take a copy of the target zones
arcpy.CopyFeatures_management(inTargetFeatures, "target_zones")
# add fields for the interpolated variables
for field in inInterpolationFields:
arcpy.AddField_management("target_zones", field, "DOUBLE")
# get spatial reference of zones
sourceSR = arcpy.Describe("source_zones").spatialReference
targetSR = arcpy.Describe("target_zones").spatialReference
# for each target zone...
with arcpy.da.UpdateCursor("target_zones", inInterpolationFields + ['SHAPE@'], spatial_reference=targetSR) as targetCursor:
for targetRow in targetCursor:
# create a dictionary for the row for easier field access
targetRowDict = dict(zip(inInterpolationFields + ['SHAPE@'], targetRow))
# get target geometry
targetGeom = targetRowDict['SHAPE@']
# select source zones that intersect current target zone
arcpy.SelectLayerByLocation_management("source_zones", "INTERSECT", targetGeom)
# initialise output interpolated value fields
for field in inInterpolationFields:
targetRowDict[field] = 0.0
# for each intersecting source zone...
with arcpy.da.SearchCursor("source_zones", inInterpolationFields + ['SHAPE@'], spatial_reference=sourceSR) as sourceCursor:
for sourceRow in sourceCursor:
# create a dictionary for the row for easier field access
sourceRowDict = dict(zip(inInterpolationFields + ['SHAPE@'], sourceRow))
# get source geometry
sourceGeom = sourceRowDict['SHAPE@']
# if source zone is entirely within the target zone, add the full value
if sourceGeom.within(targetGeom):
for field in inInterpolationFields:
if sourceRowDict[field]:
targetRowDict[field] += sourceRowDict[field]
else:
# calculate the ratio of the intersection area to the source zone area
ratio = sourceGeom.intersect(targetGeom, 4).area / sourceGeom.area
for field in inInterpolationFields:
if sourceRowDict[field]:
# add the areally weighted values to the output fields
targetRowDict[field] += ratio * sourceRowDict[field]
targetCursor.updateRow([targetRowDict[field] for field in inInterpolationFields + ['SHAPE@']])
# copy output target zones to output
arcpy.CopyFeatures_management("target_zones", outTargetFeatures)
return