Hi community! Wondering if y'all have any ideas to help speed this up.
(ArcGIS Pro toolbox with my arcpy script embedded, accessing data on a faraway server)
I have a data model (GDB) with 15+ feature datasets and 240+ feature classes. I'm doing is iterating through each feature class in each feature dataset, check if there is data by counting the rows, and if that count is greater than 0 adding that feature class to the map.
It adds non-empty feature classes to the map but takes 20+ minutes for large datasets (where feature classes contain more than 1000+ rows) and 15 minutes for small datasets (<50 rows) when data is run from the server. Less than 5 minutes when the GDB is copied to C: drive, but that's still quite a while.
I've tried using a cursor instead, and I've been unsuccessful looking for methods other than GetCount_management(feature_class).getOutput to figure out if a feature class is empty or not-- I am guessing this slows me down since the iteration itself isn't the slowest part. Most of the slow is when counting the rows and adding them to the map. I've also been looking into multiprocessing but can't seem to wrap my head around that yet.
If this turns out to be a 'your data is too far away / connectivity' issue, which I'm afraid it is, at least I'll have a second opinion on that.
import arcpy
gdb_path = arcpy.GetParameterAsText(0)
arcpy.AddMessage("Connection to GDB successful.")
# Set the path to your ArcGIS Pro project
aprx = arcpy.mp.ArcGISProject("CURRENT")
arcpy.AddMessage("Current project selected.")
# Get the first map in project
map = aprx.listMaps()[0]
arcpy.AddMessage("Getting first map in project.")
arcpy.env.workspace = gdb_path
arcpy.AddMessage("Made GDB your current workspace.")
feature_datasets = arcpy.ListDatasets(feature_type='Feature')
# Iterate through each feature dataset in the geodatabase
for dataset in feature_datasets:
arcpy.env.workspace = f"{gdb_path}/{dataset}"
feature_classes = arcpy.ListFeatureClasses()
arcpy.AddMessage(f"Checked {dataset}")
# Iterate through each feature class in the feature dataset
for feature_class in feature_classes:
# Check if the feature class has any rows
if int(arcpy.GetCount_management(feature_class).getOutput(0)) > 0:
# Add the feature class to the map
data_file = f"{gdb_path}/{dataset}/{feature_class}"
map.addDataFromPath(data_file)
arcpy.AddMessage(f"Added {feature_class}.")
Solved! Go to Solution.
Thank you everyone for your input! I've been mixing and matching solutions, and I settled on a solution that is a mix of a few. I have gotten down to a lightning 27.17 seconds when connecting to a GDB in my C: Drive (under 10 seconds for small projects!), and a considerably faster 5 minutes 23 seconds for data on the faraway server (which is amazing considering it used to take 20+ minutes)!
I used da.Walk as recommended by @Clubdebambos since turns out you were right, setting my workspace environment for each dataset iteration was taking up a considerable amount of time. I stuck with the cursor idea from @RPGIS, and combined that with the suggestion to use cursor._as_narray() from @DanPatterson, which worked really well and funnily enough the only docs I could find on using that was your 4/2017 blog on Cursors... a cursory overview.
Thanks everyone for all your help! 😁
for dirpath, workspaces, datatypes in arcpy.da.Walk(gdb_path, datatype="FeatureClass"):
for datatype in datatypes:
data_file = f"{dirpath}/{datatype}"
with arcpy.da.SearchCursor(data_file, "OID@") as cursor:
a = cursor._as_narray()
if len(a) > 0:
# Add the feature class to the map
map.addDataFromPath(data_file)
arcpy.AddMessage(f"Added {datatype}.")
else:
arcpy.AddMessage(f"Did not add {datatype}, nothing there.")
One thing that I may suggest is using a counter to identify if there are any records in a feature class. If you are looking to see if a feature class is empty, rather than lopping through to get a count of all records, you can simply set a count limit. If the count limit is exceeded then you can add that feature class to the map. This should significantly speed up the process.
import arcpy,
from arcpy import ListFields
from arcpy import SearchCursor as Search
Feature = r'<feature class file path>'
fields = [ field.name for field in ListFields( F )]
counter = 0
limit = 25
with Search( Feature , fields ) as cursor:
for row in cursor:
poprecords = len([ x for x in row if x is not None])
if poprecords > 0: counter += 1
if counter > limit: break
Wow, thank you! Let me give that a shot and I'll let you know how it goes. What a great idea!
I was curious, perhaps you could compare with a larger dataset. This one only has 500ish records but several million points.
%%timeit
fc00 = r"C:\arcpro_npg\Project_npg\npgeom.gdb\Ontario_LCConic"
result = arcpy.management.GetCount(fc00)
n = int(result[0])
60.9 ms ± 1.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
fc00 = r"C:\arcpro_npg\Project_npg\npgeom.gdb\Ontario_LCConic"
ont = FeatureClassToNumPyArray(fc00, ['OID@'])
N = ont.shape[0]
814 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The difference in speed may make it worthwhile doing a quick check using numpy before attempting to load it. Worth a thought.
almost forgot what is buried with FeatureClassToNumPyArray (I suspect)
%%timeit
fc00 = r"C:\arcpro_npg\Project_npg\npgeom.gdb\Ontario_LCConic"
with arcpy.da.SearchCursor(fc00, "OID@") as cursor:
a = cursor._as_narray()
n = len(a)
828 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
if you like the "with" thing
I often find the switch to workspaces using the arcpy.env.workspace setting to add duration to scripts. Alternatively you could use the da.Walk (docs).
The GetCount is another time sink and you could try replacing with the da.Describe (docs) and accessing the extent property which is an Extent object. If the XMin (or other values) are greater than 0, then there are features in the dataset, if not greater than zero there is no extent and no features present. Each component of the Extent is a float (docs)
desc = arcpy.da.Describe(fc)
if desc["extent"].XMin > 0:
print("Extent Found")
map.addDataFromPath(desc["catalogPath"])
else:
print("Empty")
In addition, something that is also quickly done, is with the search cursor there is the option to use sql queries within the cursor that will allow for you to filter data based on a certain criteria. This might be faster than using the cursor without a sql statement.
Note: Script updated on 2/4/2025
import arcpy
import SearchCursor as Search
feature = r'<filepath>'
fieldnames = [<list of fieldnames>]
query = 'WHERE <fieldname> IS NOT NULL'
# You can also set the query to as an f string to f"WHERE NONE NOT IN {tuple(fieldnames)}"
limit = 25
hasvalues = False
with Search( feature , fieldnames , query ) as cursor:
for row in cursor:
if len( set(row).difference({None}) ) > 0: hasvalues = True
if limit == 0 or hasvalues is True: break
limit = limit - 1
if hasvalues is True: x = 'Do something'
Haven't benchmarked this, but throwing my potential solution into the ring. There's a chance this will be faster since it doesn't look beyond the first row of the feature class (as opposed to loading the entire thing into an array).
I updated my example to fit into your da.Walk solution (2/11/25)
def has_data(fc):
with arcpy.da.SearchCursor(fc, ["OID@"]) as cursor:
return next(cursor, None) is not None
for dirpath, workspaces, datatypes in arcpy.da.Walk(gdb_path, datatype="FeatureClass"):
for datatype in datatypes:
data_file = f"{dirpath}/{datatype}"
if has_data(data_file):
map.addDataFromPath(data_file)
arcpy.AddMessage(f"Added {datatype}.")
else:
arcpy.AddMessage(f"Did not add {datatype}, nothing there.")
Thank you everyone for your input! I've been mixing and matching solutions, and I settled on a solution that is a mix of a few. I have gotten down to a lightning 27.17 seconds when connecting to a GDB in my C: Drive (under 10 seconds for small projects!), and a considerably faster 5 minutes 23 seconds for data on the faraway server (which is amazing considering it used to take 20+ minutes)!
I used da.Walk as recommended by @Clubdebambos since turns out you were right, setting my workspace environment for each dataset iteration was taking up a considerable amount of time. I stuck with the cursor idea from @RPGIS, and combined that with the suggestion to use cursor._as_narray() from @DanPatterson, which worked really well and funnily enough the only docs I could find on using that was your 4/2017 blog on Cursors... a cursory overview.
Thanks everyone for all your help! 😁
for dirpath, workspaces, datatypes in arcpy.da.Walk(gdb_path, datatype="FeatureClass"):
for datatype in datatypes:
data_file = f"{dirpath}/{datatype}"
with arcpy.da.SearchCursor(data_file, "OID@") as cursor:
a = cursor._as_narray()
if len(a) > 0:
# Add the feature class to the map
map.addDataFromPath(data_file)
arcpy.AddMessage(f"Added {datatype}.")
else:
arcpy.AddMessage(f"Did not add {datatype}, nothing there.")
one can learn from history, although 4/2017 isn't that long ago.
😆