Code that calls SelectlayerByLocation tool gets progressively slower, why?

DuncanHornby · ‎10-30-2024

Dear developers,

I have a script that loops over a dozen to may be several thousand overlapping polygons selecting data and counting the selections. This is all running OK except that over time it just gets slower and slower. I've narrowed it down to the Select layer by Location tool.

Attached to this question is a zip file containing a folder with an arcpro project, two very simple dummy layers and a script that reproduces the problem with the supplied dummy data.

If you run the script (testscript.py) in IDLE or VSCode, then initially its rocket fast but after about 1000 iterations is begins to slow down.

As you will see in the script it is simply repeating itself so I would have expect the select tool to run at a constant speed. So my question why is it not and can it be fixed?

I provide this example so that the user community can test what I'm saying and the developers behind arcpro have a real example of data/code that reproduces this issue. This is all happening on arcpro 3.3.2 on a machine with 16GB ram and SSD for a c:\ drive.

Is it some sort of bug in the tool?

Duncan

DuncanHornby · ‎11-04-2024

Dan/Anyone else interested,

So Dan showed a numpy approach which was blisteringly fast. Ouch! But resisting the pull of numpy (because my brain does not think like that) it got me thinking about a different way of approaching this simple problem of overcoming the ever increasing slowness of the SelectLayerByLocation tool when called within a loop.

My updated script which I show below burns through the same task now in 7 seconds, nice! It requires ArcPro 3.2 and above and makes use of the improved SearchCursor that can take a geometry as a spatial filter.

import arcpy, os
import time

arcpy.SetLogHistory(False)
arcpy.SetLogMetadata(False)
fp = os.path.abspath(__file__)
dir = (os.path.dirname(fp))
lyrPoints = dir + r"\Test.gdb\testpoints"
lyrPolygons = os.path.join(dir +  r"\Test.gdb\testpoly")

print(lyrPoints) # Prove to myself I actually have the full path to featureclass

# Grab the polygon from layer, there is only one polygon in the layer
with arcpy.da.SearchCursor(lyrPolygons, "SHAPE@") as cursor:
    for row in cursor:
        geom = row[0]
print("Polygon obtained! Area is: " + str(geom.area))

d=dict()
start_time = time.time()
for i in range(1,3000):
    print("Step " + str(i))
    # Use polygon as a spatial filter
    with arcpy.da.SearchCursor(lyrPoints, ["id"], None, None, False, None, None, spatial_filter=geom, spatial_relationship="INTERSECTS") as cursor:
        x = 0
        for row in cursor:
            x += 1 # Count rows in cursor
        d[i] = x
end_time = time.time()
elapsed_time = end_time - start_time
print(f" Elapsed time: {elapsed_time:.2f} seconds")

View solution in original post

DanPatterson · ‎10-30-2024

I can't get selectyByAnything to work faster, so I use numpy and convert my feature classes to numpy arrays using arcpy.da.FeatureClassToNumPyArray plus some of my sprucing up code to do most things.

This is a classic point-in-polygon test, and even with this simplistic example (one set of points, one polygon) I couldn't even get it to blister.

I modified your code to only print the elapsed time for N runs. I got tired of seeing it get incrementally slower (and memory useage slowing creeping as well). I tried INTERSECT and WITHIN only and I gave up.

Do note that using selectByAttributes with a clear selection didn't improve things, nor does a del in python.

INTERSECT, n = 100
c:\temp\test\test\Test.gdb\testpoly
Step 99 Elapsed time: 38.04 seconds

WITHIN,  n = 100
c:\temp\test\test\Test.gdb\testpoly
Step 99  Elapsed time: 54.535914182662964 secs
Done

numpy,  n = 100
c:\temp\test\test\Test.gdb\testpoly
Step 99  Elapsed time: 0.015819072723388672
Done
numpy,  n = 1000
c:\temp\test\test\Test.gdb\testpoly
Step 999  Elapsed time: 0.03198409080505371 secs

c:\temp\test\test\Test.gdb\testpoly
Step 9999  Elapsed time: 0.34787607192993164 secs
Done
c:\temp\test\test\Test.gdb\testpoly
Step 99999  Elapsed time: 2.994032382965088 secs
Done
c:\temp\test\test\Test.gdb\testpoly
Step 999999  Elapsed time: 29.093031883239746 secs
Done

so it took a million runs of the point in polygon using numpy to approach the time it took for 100 ish runs using arcpy functions.

If this is something you have to do often, you may want to consider other avenues.

... sort of retired...

DuncanHornby · ‎10-31-2024

Dan,

Can you share your modification of my original code so I can study how you script a spatial query with a numpy array? I'm less familiar with coding with numpy and your test runs are obviously superior to using the arcpy tool.

Duncan

DanPatterson · ‎11-01-2024

Duncan

Point in Polygon ... Geometry Mysteries - Esri Community circa 2020, np_wn is the basic winding number shell I use.

There are numerous ways to get arc* shapes into numpy arrays, some quicker than others.

I have been working on my own geometry tools for some time if you have a general interest in this topic

Dan-Patterson github : numpy_geometry

... sort of retired...

HaydenWelch · ‎11-07-2024

Love your numpy library, one quick question though. I know that spatial databases in general tend to have a spatial index on data that works in linear time as long as the index is maintained or updated frequently.

The pip/point in polygon approach is pretty similar to a binary tree spatial index, so is there any good reason to use numpy arrays over passing a spatial filter to the database and leveraging the existing indexes?

I have not tested this yet, but I might give it a shot tomorrow to see what the performance difference is.

For data that isn't in an indexed spatial database though the numpy approach is a no brainer as you don't need to add the overhead of a SQL database to get the benefits of the index.

DanPatterson · ‎11-08-2024

in the example of point in polygon, assessing whether the points within the spatial extent of the polygon are done in one pass, then the calculation needed to determine whether those points (in the polygon extent) are inside, on the boundary of, or outside the polygons boundary is also done in one pass.

The term is "broadcasting". The assessment of a condition or application of a calculation is not done in an iterative fashion as would be the case of most languages that use "for" or "while" loops to test for a condition.

An example of assessing points within a polygon extent.

LB, RT = np.array([[2., 2.], [8., 8.]])  # -- polygon extent, left bottom right top
A = np.array([[1., 0.], [4, 10.], [6., 10.], [9., 0.], [7., 0.], [6., 4.],
              [4., 4.], [3., 0.], [1., 0.]])  # -- a polygon
comp = np.logical_and(LB <= A, A <= RT) # -- compare the array to the bounds
idx = np.logical_and(comp[..., 0], comp[..., 1])  # X and Y check
A[idx] # -- return the points completely within the extent
array([[  6.00,   4.00],
       [  4.00,   4.00]])

Now I could do some quick modifications and provide a fishnet like polygon structure representing tiling off an area and determine which polygons fell within each grid cell, effectively doing multiple points within multiple polygons all at once.

I have some basic numpy stuff on my github site exploring application of numpy to spatial questions. It is far from complete but the code in npGeo and related scripts will provide some guidance.

... sort of retired...

DuncanHornby · ‎11-04-2024

Dan/Anyone else interested,

So Dan showed a numpy approach which was blisteringly fast. Ouch! But resisting the pull of numpy (because my brain does not think like that) it got me thinking about a different way of approaching this simple problem of overcoming the ever increasing slowness of the SelectLayerByLocation tool when called within a loop.

My updated script which I show below burns through the same task now in 7 seconds, nice! It requires ArcPro 3.2 and above and makes use of the improved SearchCursor that can take a geometry as a spatial filter.

import arcpy, os
import time

arcpy.SetLogHistory(False)
arcpy.SetLogMetadata(False)
fp = os.path.abspath(__file__)
dir = (os.path.dirname(fp))
lyrPoints = dir + r"\Test.gdb\testpoints"
lyrPolygons = os.path.join(dir +  r"\Test.gdb\testpoly")

print(lyrPoints) # Prove to myself I actually have the full path to featureclass

# Grab the polygon from layer, there is only one polygon in the layer
with arcpy.da.SearchCursor(lyrPolygons, "SHAPE@") as cursor:
    for row in cursor:
        geom = row[0]
print("Polygon obtained! Area is: " + str(geom.area))

d=dict()
start_time = time.time()
for i in range(1,3000):
    print("Step " + str(i))
    # Use polygon as a spatial filter
    with arcpy.da.SearchCursor(lyrPoints, ["id"], None, None, False, None, None, spatial_filter=geom, spatial_relationship="INTERSECTS") as cursor:
        x = 0
        for row in cursor:
            x += 1 # Count rows in cursor
        d[i] = x
end_time = time.time()
elapsed_time = end_time - start_time
print(f" Elapsed time: {elapsed_time:.2f} seconds")