AnsweredAssumed Answered

How can I optimize this script to avoid nested cursors?

Question asked by DanEvans83 on Apr 24, 2018

I'm writing a python toolbox implementing several different areal interpolation methods, from the most basic areal weighting up to methods that use some statistical techniques.


I'm planning on running some monte carlo simulations to assess the accuracy of each method with a particular set of input data, so I need to make the execution as quick as possible.


My current implementation for areal weighting is below. I've used a searchcursor nested within an updatecursor, which I'm led to believe is not the most efficient way.


Basically the task is to interpolate values from source zones (census polygons) to arbitrary target zone polygons, in this case based simply on the ratio of the area of each source zone that intersects a target zone to the total area of that source zone.


To do this, I loop through the target zones, select just the source zones that intersect the target zone, then loop through those source zones. Where the source zone is entirely contained within the target zone, I just take the total value from that source zone (I'm not sure if this optimization is worth it as all I save is the ratio calculation...), otherwise, I calculate the ratio and multiply it by the value to be interpolated.


I've been trying to think of a way to avoid this inner searchcursor by creating a dictionary outside the updatecursor. The difficulty is that this dictionary does not need to include all the source zones, just those that intersect the target zone. Ideally I'd have a dictionary with the ID of each target zone as the key, and the value would have to be another dictionary for each intersecting source zone with the interpolation and geometry fields as keys and the values of those fields as values. But it seems like to build this dictionary would require running an intersect or spatial join first, or another nested loop, so I'm unsure whether I would really gain much speed this way.


I know I can use cProfile to see exactly how long each part is taking, but it's not clear how best to set it up within a python toolbox - can anyone provide some examples of how best to do this?


Anyway, here is the execute code:


    def execute(self, parameters, messages):
        """The source code of the tool."""

        # get parameters
        inSourceFeatures = parameters[0].valueAsText
        inInterpolationFields = parameters[1].valueAsText.split(";")
        inTargetFeatures = parameters[2].valueAsText
        outTargetFeatures = parameters[3].valueAsText

        # use in_memory workspace for quicker results
        arcpy.env.workspace = 'in_memory'

        # make feature layers of the source zone feature class
        arcpy.MakeFeatureLayer_management(inSourceFeatures, "source_zones")

        # take a copy of the target zones
        arcpy.CopyFeatures_management(inTargetFeatures, "target_zones")

        # add fields for the interpolated variables
        for field in inInterpolationFields:
            arcpy.AddField_management("target_zones", field, "DOUBLE")

        # get spatial reference of zones
        sourceSR = arcpy.Describe("source_zones").spatialReference
        targetSR = arcpy.Describe("target_zones").spatialReference

        # for each target zone...
        with arcpy.da.UpdateCursor("target_zones", inInterpolationFields + ['SHAPE@'], spatial_reference=targetSR) as targetCursor:
            for targetRow in targetCursor:

                # create a dictionary for the row for easier field access
                targetRowDict = dict(zip(inInterpolationFields + ['SHAPE@'], targetRow))

                # get target geometry
                targetGeom = targetRowDict['SHAPE@']

                # select source zones that intersect current target zone
                arcpy.SelectLayerByLocation_management("source_zones", "INTERSECT", targetGeom)

                # initialise output interpolated value fields
                for field in inInterpolationFields:
                    targetRowDict[field] = 0.0

                # for each intersecting source zone...
                with arcpy.da.SearchCursor("source_zones", inInterpolationFields + ['SHAPE@'], spatial_reference=sourceSR) as sourceCursor:
                    for sourceRow in sourceCursor:

                        # create a dictionary for the row for easier field access
                        sourceRowDict = dict(zip(inInterpolationFields + ['SHAPE@'], sourceRow))

                        # get source geometry
                        sourceGeom = sourceRowDict['SHAPE@']

                        # if source zone is entirely within the target zone, add the full value
                        if sourceGeom.within(targetGeom):
                            for field in inInterpolationFields:
                                if sourceRowDict[field]:
                                    targetRowDict[field] += sourceRowDict[field]
                            # calculate the ratio of the intersection area to the source zone area
                            ratio = sourceGeom.intersect(targetGeom, 4).area / sourceGeom.area
                            for field in inInterpolationFields:
                                if sourceRowDict[field]:
                                    # add the areally weighted values to the output fields
                                    targetRowDict[field] += ratio * sourceRowDict[field]

                targetCursor.updateRow([targetRowDict[field] for field in inInterpolationFields + ['SHAPE@']])

        # copy output target zones to output
        arcpy.CopyFeatures_management("target_zones", outTargetFeatures)