Arcpy select layer by location and by attribute much slower in ArcGIS Pro 3.x

j_rsg · ‎10-10-2023

I have an application that needs to cursor through 150,000 to 200,000 point records and then compare to nearby polylines using select layer by location or by attribute. It seems that this has become slower in ArcGIS Pro 3.x and making a large program take much longer to finish each time. You can clearly see the differences in arcpy (10.3 desktop) compared to ArcGIS Pro 3.x. It did not seem to be an issue in ArcGIS Pro 2.x, but I haven't re-installed now to check that again.

I generated sample code to show the issue. I tested for both select by location and select by attribute (toggle the comment on line 54/55) in the old arcpy version and new the new pro 3.x version. The code uses the syntax and packages available from 2.7 so it can run in both. The results show that for 1,000 points running either select by location or attribute it is much slower. For attribute it's typically about 3 times slower and for location it is like 1.5 times slower.

Has anyone experienced similar issues? The performance degradation seems like a bug that could be improved for pro 3.x.

Some example results:

Select By Attribute:
individual point # 1000 processing time: 0.019999980926513672
total cursor time for 1000 points: 29.113652229309082
Python Version: 3.9.16 [MSC v.1931 64 bit (AMD64)]

individual point # 1000 processing time: 0.00999999046326
total cursor time for 1000 points: 10.3790001869
('Python Version:', '2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]')

Select by Location:

individual point # 1000 processing time: 0.022001028060913086
total cursor time for 1000 points: 25.866050958633423
Python Version: 3.9.16 [MSC v.1931 64 bit (AMD64)]

individual point # 1000 processing time: 0.0169999599457
total cursor time for 1000 points: 17.7720000744
('Python Version:', '2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]')

import os
import time
import arcpy
import sys
import random

print ("starting")
arcpy.env.overwriteOutput = True

# Set the default geodatabase workspace (change to your desired directory)
default_gdb_dir = r"C:\\test.gdb"

# Check if the geodatabase exists, and create it if it doesn't
if not arcpy.Exists(default_gdb_dir):
    arcpy.management.CreateFileGDB(os.path.dirname(default_gdb_dir), os.path.basename(default_gdb_dir))

# Create an empty list to store random points
random_points = []

# Specify the number of random points you want to create
num_points = 1000  # Change this to your desired number

# Generate random points and add them to the list
for _ in range(num_points):
    lon = random.uniform(-180, 180)
    lat = random.uniform(-90, 90)
    point = arcpy.Point(lon, lat)
    point_geom = arcpy.PointGeometry(point, arcpy.SpatialReference(4326))
    random_points.append(point_geom)

# Define the output feature class name
feature_class_name = "RandomPoints"

# Create a feature class in the default geodatabase
output_feature_class = os.path.join(default_gdb_dir, feature_class_name)
arcpy.CreateFeatureclass_management(os.path.dirname(output_feature_class), os.path.basename(output_feature_class), "POINT", spatial_reference=arcpy.SpatialReference(4326))

# Open an insert cursor to insert the random points into the feature class
with arcpy.da.InsertCursor(output_feature_class, ["SHAPE@"]) as cursor:
    for point_geom in random_points:
        cursor.insertRow([point_geom])

arcpy.MakeFeatureLayer_management(output_feature_class, "RandomPointsLayer")

time_sum = 0
beginTime = time.time()

# Loop through the random points in the feature layer
with arcpy.da.SearchCursor("RandomPointsLayer", ["OBJECTID","SHAPE@"]) as cursor:
    for row in cursor:
        start_time = time.time()

        ## Both Select by location and attribute are slower
        # arcpy.SelectLayerByLocation_management("RandomPointsLayer", "WITHIN_A_DISTANCE", row[1], "50 Meters", "NEW_SELECTION")
        arcpy.SelectLayerByAttribute_management("RandomPointsLayer", "NEW_SELECTION", "OBJECTID = 0")
       
        end_time = time.time()
        time_sum=time_sum+(end_time - start_time)
        print ('individual point # '+str(row[0])+' processing time: ' + str(end_time - start_time))

print ("total cursor time for "+str(num_points)+" points: "+ str(time.time()-beginTime))

# Print the Python version
print ("Python Version:", sys.version)

# Clean up: Delete the in-memory feature layer
arcpy.Delete_management("RandomPointsLayer")

VinceAngelo · ‎10-16-2023

For statistical purposes, you should use the same set of points in each environment. This could be done by initializing the random class with same initialization seed.

- V

j_rsg · ‎10-17-2023

Thanks for the suggestion Vince. I was not aware of using the seed function.

I added a random.seed() before initializing the random points and saw similar results.

I updated the code some to generate a second random set of polylines to select during the point search cursor to more closely mimic our code. I also stored both the points and polylines in memory. The old arcpy did start to slow down when using more than a few thousand points before storing in memory. For a set of 5,000 points and 20,000 polylines I saw similar speed differences as the original code for pro vs desktop arcpy: Select by location: (370s to 317s) and select by attributes (168s vs 82s).

Given the current speeds and need to use Arcgis Pro 3.x we're looking at work arounds to avoid the select by layer or select by location in our code.

Select by location
individual point # 5000 processing time: 0.07061052322387695
total cursor time for 5000 points: 370.48129749298096
Python Version: 3.9.16 [MSC v.1931 64 bit (AMD64)]

individual point # 5000 processing time: 0.077999830246
total cursor time for 5000 points: 317.149999857
('Python Version:', '2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]')

Select by attribute

individual point # 5000 processing time: 0.024550199508666992
total cursor time for 5000 points: 168.4435179233551
Python Version: 3.9.16 [MSC v.1931 64 bit (AMD64)]

individual point # 5000 processing time: 0.0320000648499
total cursor time for 5000 points: 82.2960000038
('Python Version:', '2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]')

import os
import time
import arcpy
import sys
import random

print ("starting")
arcpy.env.overwriteOutput = True

# Set the default geodatabase workspace (change to your desired directory)
default_gdb_dir = r"C:\\test.gdb"

# Check if the geodatabase exists, and create it if it doesn't
if not arcpy.Exists(default_gdb_dir):
    arcpy.management.CreateFileGDB(os.path.dirname(default_gdb_dir), os.path.basename(default_gdb_dir))

# Set the random seed for reproducibility
random.seed(5)  

# Create an empty list to store random points
random_points = []

# Specify the number of random points you want to create
num_points = 5000  # Change this to your desired number

# Generate random points and add them to the list
for _ in range(num_points):
    lon = random.uniform(-180, 180)
    lat = random.uniform(-90, 90)
    point = arcpy.Point(lon, lat)
    point_geom = arcpy.PointGeometry(point, arcpy.SpatialReference(4326))
    random_points.append(point_geom)

# Define the output feature class name
feature_class_name = "RandomPoints"

# Create a feature class in the default geodatabase
output_feature_class = os.path.join(default_gdb_dir, feature_class_name)
arcpy.CreateFeatureclass_management(os.path.dirname(output_feature_class), os.path.basename(output_feature_class), "POINT", spatial_reference=arcpy.SpatialReference(4326))

# Open an insert cursor to insert the random points into the feature class
with arcpy.da.InsertCursor(output_feature_class, ["SHAPE@"]) as cursor:
    for point_geom in random_points:
        cursor.insertRow([point_geom])

# Copy point features to memory and make the feature layer
pointsMemory = "in_memory\\points"
arcpy.CopyFeatures_management(output_feature_class, pointsMemory)
arcpy.MakeFeatureLayer_management(pointsMemory, "RandomPointsLayer")

# Create an empty list to store random polylines
random_polylines = []

# Specify the number of random polylines you want to create
num_polylines = 20000  # Change this to your desired number

# Generate random polylines and add them to the list
for _ in range(num_polylines):
    polyline = arcpy.Polyline(arcpy.Array([arcpy.Point(random.uniform(-180, 180), random.uniform(-90, 90)),
                                           arcpy.Point(random.uniform(-180, 180), random.uniform(-90, 90)),
                                           arcpy.Point(random.uniform(-180, 180), random.uniform(-90, 90))]),
                              arcpy.SpatialReference(4326))
    random_polylines.append(polyline)

# Define the output polyline feature class name
polyline_feature_class_name = "RandomPolylines"

# Create a polyline feature class in the default geodatabase
output_polyline_feature_class = os.path.join(default_gdb_dir, polyline_feature_class_name)
arcpy.CreateFeatureclass_management(os.path.dirname(output_polyline_feature_class),
                                    os.path.basename(output_polyline_feature_class),
                                    "POLYLINE",
                                    spatial_reference=arcpy.SpatialReference(4326))

# Open an insert cursor to insert the random polylines into the feature class
with arcpy.da.InsertCursor(output_polyline_feature_class, ["SHAPE@"]) as cursor:
    for polyline in random_polylines:
        cursor.insertRow([polyline])

# Copy polylines to memory and create the feature layer
polylineMemory = "in_memory\\polylines"
arcpy.CopyFeatures_management(output_polyline_feature_class, polylineMemory)
arcpy.MakeFeatureLayer_management(polylineMemory, "RandomPolylinesLayer")

time_sum = 0
beginTime = time.time()

# Loop through the random points in the feature layer
with arcpy.da.SearchCursor("RandomPointsLayer", ["OBJECTID","SHAPE@"]) as cursor:
    for row in cursor:
        start_time = time.time()

        ## Both Select by location and attribute are slower
        # arcpy.SelectLayerByLocation_management("RandomPointsLayer", "WITHIN_A_DISTANCE", row[1], "50 Meters", "NEW_SELECTION")
        arcpy.SelectLayerByLocation_management("RandomPolylinesLayer", "WITHIN_A_DISTANCE", row[1], "50 Meters", "NEW_SELECTION")        
        # arcpy.SelectLayerByAttribute_management("RandomPolylinesLayer", "NEW_SELECTION", "OBJECTID = 0")
       
        end_time = time.time()
        time_sum=time_sum+(end_time - start_time)
        print ('individual point # '+str(row[0])+' processing time: ' + str(end_time - start_time))

print ("total cursor time for "+str(num_points)+" points: "+ str(time.time()-beginTime))

# Print the Python version
print ("Python Version:", sys.version)

# Clean up: Delete the in-memory feature layers
arcpy.Delete_management("RandomPointsLayer")
arcpy.Delete_management("RandomPolylinesLayer")