Python Nested select by attribute and location slow

ClaireElliott · ‎04-01-2022

Hello, I'm trying to make a simple script to loop through a list of properties (~700), selecting one at a time, and then select the intersecting features (there's 21 feature classes) and print them to screen. When i execute this script, it starts slowing down each time it goes to a new property. By property 15/700, it's already taking a few seconds for the "HAVE_THEIR_CENTRE_IN" select by location method. I feel there's something obvious I'm missing, or not resetting/cleaning up between each iteration in my loop.

Thank you for any help/advice.

import arcpy
import os

Property = r'C:\Boundaries.gdb\Property'
FeatureClasses = r'C:\FieldData.gdb'

properties = [row[0] for row in arcpy.da.SearchCursor(Property, 'NAME')]

arcpy.env.workspace = FeatureClasses

fcList = arcpy.ListFeatureClasses()

for prop in properties:
    whereClause = "NAME = '"+prop+"'"
    selectedProp = arcpy.management.SelectLayerByAttribute(Property, "NEW_SELECTION", whereClause)
    print('property: '+prop)
    for fc in fcList:
        selectedFeatures = arcpy.management.SelectLayerByLocation(os.path.join(FeatureClasses, fc), "HAVE_THEIR_CENTER_IN", selectedProp)
        print('fc: ' + str(fc) + ' and found "' + str(len(selectedFeatures)) + '"')

DanPatterson · ‎04-01-2022

Select Layer By Location (Data Management)—ArcGIS Pro | Documentation

If the input is a feature class or dataset path, this tool will automatically create and return a new layer with the result of the tool applied.

could be one memory issue, especially since you assign it to selectedFeatures (delete after the print statement perhaps)

Another thing is the assumption that all are in the same coordinate system. If that isn't true, then you have projections and temporary data being created.

Try a few of those to see if things speed up

... sort of retired...

DonMorrison1 · ‎04-01-2022

Other than running slow does it work OK? If line 19 is trying to print the number of features selected on line 18, then your code looks wrong to me. I would expect 'selectedFeatures[2]' would give you the correct count. On the performance question, you could try to explicitly delete the layer object that is getting created every iteration 'arcpy.Delete_management(selectedFeatures[0])' . I doubt it will help but it is worth a try.

ClaireElliott · ‎04-02-2022

Thank you both for your suggestions! Some progress has been made, but still looking for some further ideas as to what might be causing the script to be getting slower as it runs. Some observations:

The projections were in different coordinate systems, so I did correct this. No discernible difference in speed however
corrected the print to use selectedFeatures[2] instead of the len(...)
Added the mgmt.Delete after the print, and the speed did improve.
- Before adding Delete, it took about 1 hour to get to property 20.
- After adding Delete, it took about 1 hour to get to property 100, great a 5x increase in speed!

However, the same progressive slowdown occurred as it iterated through the properties, it just took longer to occur. So I feel that was a good improvement, but I'm wondering if there's anymore ideas like that which will clean up unused data/memory to keep the script running quickly.

Here's the updated code:

import arcpy
import os

Property = r'C:\Boundaries.gdb\Property'
FeatureClasses = r'C:\FieldData.gdb'

properties = [row[0] for row in arcpy.da.SearchCursor(Property, 'NAME')]

arcpy.env.workspace = FeatureClasses

fcList = arcpy.ListFeatureClasses()

for prop in properties:
    whereClause = "NAME = '"+prop+"'"
    selectedProp = arcpy.management.SelectLayerByAttribute(Property, "NEW_SELECTION", whereClause)
    print('property: '+prop)
    for fc in fcList:
        selectedFeatures = arcpy.management.SelectLayerByLocation(os.path.join(FeatureClasses, fc), "HAVE_THEIR_CENTER_IN", selectedProp)
        print('fc: ' + str(fc) + ' and found "' + str(selectedFeatures[2]) + '"')
        arcpy.management.Delete(selectedFeatures, "")

DanPatterson · ‎04-02-2022

short of deleting selectedProp too indented on line "21" after the for fc in fcList, it looks good. although it shouldn't be maintaining the previous selection when a new selection is made, but who knows

... sort of retired...

Anonymous User · ‎04-03-2022

Since it looks like you are not creating a unique name list, I would skip creating that list and remove the select by attribute sql process. You can pass the geometry from the search cursor to the select by location method.

Edited to add printing the name from the feature.

import arcpy
import os

Property = r'C:\Boundaries.gdb\Property'
FeatureClasses = r'C:\FieldData.gdb'

arcpy.env.workspace = FeatureClasses

fcList = arcpy.ListFeatureClasses()

with arcpy.da.SearchCursor(Property, ['Name', 'SHAPE@']) as sCur:
    for row in sCur:
        for fc in fcList:
            selectedFeatures = arcpy.management.SelectLayerByLocation(os.path.join(FeatureClasses, fc), "HAVE_THEIR_CENTER_IN", row[1])
            print(f'feature name: {row[0]} found {str(selectedFeatures[2])} in fc: {fc}')

Anonymous User · ‎04-05-2022

For giggles, you can also break this up into multiprocessing. You'll have to modify the sql query/ fc stuff for it to work with your structure but its about the same process.

This file is name IntersectWorker:

# Import modules
import os.path
from arcpy import management

# -------------------------------------------------------------------------------
def get_intersect_count(feature, selFc, testFc):
    """
    Function to return intersecting features
    """
    try:
        whereClause = f"OBJECTID = {feature}"
        selectedProp = management.SelectLayerByAttribute(selFc, "NEW_SELECTION", whereClause)

        selectedFeatures = management.SelectLayerByLocation(testFc, "INTERSECT", selectedProp)

        return {'feature': feature, 'testFc': os.path.basename(testFc), 'count': selectedFeatures[2], 'msg': 'success'}

    except Exception as ex:
        return {'feature': feature, 'testFc': os.path.basename(testFc), 'count': 0, 'msg': ex}

This is the main file named IntersectMain:

# Import modules
import os
from arcpy import da
from arcpy import env
from arcpy import ListFeatureClasses
from multiprocessing import Pool
import multiprocessing as mp
from IntersectWorker import get_intersect_count

if __name__ == "__main__":

    selFc = r'path to the selecting fc'

    basePath = r'path to the .gdb'

    env.workspace = basePath
    fcList = ListFeatureClasses(feature_type='Polygon')

    argsList = []

    for prop in [row[0] for row in da.SearchCursor(selFc, ['OBJECTID'])]:
        for fc in fcList:
            args = (prop, selFc, os.path.join(basePath, fc))
            argsList.append(args)

    with Pool(mp.cpu_count() - 1) as pool:
        result = pool.starmap(get_intersect_count, argsList)

    for res in result:
        if res['msg'] == 'success':
            print(f'feature: {res["feature"]} found: {res["count"]}  features in {res["testFc"]}')
        else:
            print(f'intersecting features in {res["testFc"]} failed with {res["msg"]}')