Hi,
I have a list of geodatabase feature class names and I would like to have the dataset names for each if any.
Ideally I would prefer not to loop over each element of the geodatabase to find it. I was hoping arcpy.Describe would contain that information, but no.
import os, arcpy
workspace = "path/to/geodatabase.gdb"
fc_list = ["fc_1_name", "fc_2_name", "fc_3_name"]
for fc in fc_list:
desc = arcpy.Describe(os.path.join(workspace, fc))
print(desc.catalogPath) #does not includes dataset if any
Any suggestion would be appreciated!
Thanks
Solved! Go to Solution.
There is not a good way.
Please vote on this related Idea here: Feature Classes: Add Property denoting whether it'... - Esri Community
Couple things here, though.
Feature Datasets don't exist. You think they do, we pretend they do, but for most operations you use, you can plug in "C:\test.gdb\ExFD\exFC" OR "C:\test.gd\exFC" and get the same result. Try it on Buffer() or Exists() if you don't believe me. Where we run into problems is when we're actually trying to search for stuff, and then Pro thinks that they matter for some reason.
Because of this, the way you're going about this is kind of self-defeating because Describe() will take the path with or without the dataset in there.
Best way to do this is, unfortunately, to set the workspace environment and search through there. Something like this (I wrote this from memory so idk if the capitalization is right)
fdDict = {} # {fd1: [fcA, fcB], fd2:[fcC, fcD]}
gdb = r"...\ex.gdb"
arcpy.env.workspace = gdb
for fd in arcpy.ListDatasets("Feature"):
fdDict[fd] = []
arcpy.env.workspace = os.path.join(gdb, fd)
for fc in arcpy.ListFeatureClasses():
fdDict[fd].append(fc)
Please also vote on this Idea to make iterating through these workspaces less awful: arcpy.List[Type] functions: Let us feed it a works... - Esri Community
Hi @MaximeDemers,
Have you looked into List Datasets or Walk for that matter.
That should help guide you to identify the list of datasets, but if you are looking to get the list of datasets using the feature class filepath then here is a sample below.
import arcpy
Workspace = '<some sde or gdb>'
Walk = arcpy.da.Walk( Workspace , datatype="FeatureDataset" )
for root, dirname, filenames in Walk:
print( dirname )
for filename in filenames:
print( '<something>' )
da.Walk is such a powerful tool, but it's irritating how slow it can be. Because it's recursively extracting the structure from the root, it can sometimes take seconds to run when passed a moderately complex workspace.
This isn't a big deal when you are only running it once, but I frequently have tools that need to get their context on initialization, and if you have say 10 tools in a toolbox that all call da.Walk, that can sometimes mean loading the toolbox in will take 10-15 seconds.
Again, not the worst if you absolutely need all that info, but it's a recipe for people thinking things are broken when the toolbox loading in locks the main thread for a long time with no clear reason as to what's happening.
This also happens with the Python Window when it processes auto-complete options for your current cursor position. Because it's trying to find out the short names of the layers you can put in to a function call, it will just hang the main thread until it finds them, then hang it again if you move your cursor.
Hi @MaximeDemers,
Multiple ways to achieve this and you can still go down the Describe route.
import arcpy
workspace = "path/to/geodatabase.gdb"
fc_list = ["fc_1_name", "fc_2_name", "fc_3_name"]
## describe the gdb
desc = arcpy.da.Describe(workspace)
## use dictionary comprehension
## if fc name is forund in a Feature Dataset and entry is made in the dictionary
## FC_NAME (key) : FD_NAME (value)
fc_dict = {fc["name"]: fd["name"] for fd in desc["children"] if fd["dataType"] == "FeatureDataset" for fc in fd["children"] if fc["name"] in fc_list}
print(fc_dict)
Does this help?
Cheers,
Glen
There is not a good way.
Please vote on this related Idea here: Feature Classes: Add Property denoting whether it'... - Esri Community
Couple things here, though.
Feature Datasets don't exist. You think they do, we pretend they do, but for most operations you use, you can plug in "C:\test.gdb\ExFD\exFC" OR "C:\test.gd\exFC" and get the same result. Try it on Buffer() or Exists() if you don't believe me. Where we run into problems is when we're actually trying to search for stuff, and then Pro thinks that they matter for some reason.
Because of this, the way you're going about this is kind of self-defeating because Describe() will take the path with or without the dataset in there.
Best way to do this is, unfortunately, to set the workspace environment and search through there. Something like this (I wrote this from memory so idk if the capitalization is right)
fdDict = {} # {fd1: [fcA, fcB], fd2:[fcC, fcD]}
gdb = r"...\ex.gdb"
arcpy.env.workspace = gdb
for fd in arcpy.ListDatasets("Feature"):
fdDict[fd] = []
arcpy.env.workspace = os.path.join(gdb, fd)
for fc in arcpy.ListFeatureClasses():
fdDict[fd].append(fc)
Please also vote on this Idea to make iterating through these workspaces less awful: arcpy.List[Type] functions: Let us feed it a works... - Esri Community
Thank you for your answer!
If I have no choice but to loop over datasets, this is how I do:
import os, arcpy
workspace = "path/to/geodabase.gdb"
arcpy.env.workspace = workspace
datasets = arcpy.ListDatasets()
fc_list = ["fc_1_name", "fc_2_name", "fc_3_name"]
for fc in fc_list:
dataset = next((dataset for dataset in datasets if arcpy.Exists(os.path.join(workspace, dataset, fc)), None)
print(dataset, fc)
Walk - ArcGIS Pro | Documentation is more functional, idiomatic, and performant then relying on the older ArcPy List functions.
When you need to walk a whole directory, it can be faster. However it seems that if all you already know what you want, the List functions can be faster:
Here's a test where I'm pulling all the feature class names from the root of the workspace:
from arcpy.da import Walk
from arcpy import EnvManager, ListFeatureClasses
def test_list(workspace: str):
fcs = []
with EnvManager(workspace=workspace):
fcs.extend(ListFeatureClasses())
return fcs
def test_walk(workspace: str):
fcs = []
for root, dirs, files in Walk(workspace, datatype='FeatureClass'):
fcs.extend([f for f in files])
return fcs
>>> %timeit test_list(wsp)
18.7 ms ± 80.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit test_walk(wsp)
112 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
When you have to do a recursive dataset traversal, Walk edges out the List workflow:
def test_list(workspace: str):
fcs = []
with EnvManager(workspace=workspace):
fcs.extend(ListFeatureClasses())
for ds in ListDatasets():
fcs.extend(ListFeatureClasses(feature_dataset=ds))
return fcs
def test_walk(workspace: str):
fcs = []
for root, dirs, files in Walk(workspace, datatype='FeatureClass'):
fcs.extend([f for f in files])
return fcs
>>> %timeit test_list(wsp)
184 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit test_walk(wsp)
111 ms ± 814 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Basically, the List functions are going to change in execution time depending on how you end up using them, while Walk will be pretty consistent. 1/10th of a second for a GDB with ~75 Feature Classes and 4 FeatureDatasets.
I do agree with it being more idiomatic, though it can be a bit confusing, for example if you want to pull datasets from a project workspace with multiple GDBs, you need to do this:
for root, dirs, files in Walk(self.path, datatype=datatype):
if 'Dataset' in datatype:
# add some filtering so only dirs after a .gdb are added
# without this all directories are considered datasets
paths.extend([Path(root) / d for d in dirs if root.endswith('.gdb')])
else:
paths.extend([Path(root) / f for f in files])
Because the 'FeatureDataset' datatype will return all empty directories in the root recursively. So you need to filter on the root and make sure it's in a gdb or you'll get all the folders...
I tested your code on several file and mobile geodatabases on my machine, and the Walk code was almost an order of magnitude faster for all cases. I am not sure why it is slower in your tests on your machine. Can you run the tests not using a Notebook.
You can find the dataset using some Path magic too:
from __future__ import annotations
from pathlib import Path
from arcpy import Describe
# For Describe type hinting
try:
from arcpy.typing.describe import FeatureClass
except ImportError: # will fail at runtime do to malformed package
pass
def get_fc_dataset(fc: Path) -> str:
fc = Path(fc)
fc_desc: FeatureClass = Describe(str(fc))
fc_dataset = fc.parent
wsp_path = Path(fc_desc.workspace.catalogPath)
if fc_dataset != wsp_path:
return str(fc_dataset.relative_to(wsp_path))
Because the workspace of a feature class doesn't include the Dataset it belongs to, you can get the relative component of the featureclass parent and the workspace. Basically all this does is remove the FC name from the path and then check for the part of that parent path that isn't in the workspace.
This function currently returns the name of the dataset, but you could also have it return the full path by just returning the fc_dataset.
Here's Alfred's code written as a function with the same return result as this one:
from arcpy import Describe, ListDatasets, EnvManager
def get_fc_dataset_list(fc: str) -> str:
fc_desc: FeatureClass = Describe(fc)
with EnvManager(workspace=fc_desc.workspace.catalogPath):
for ds in ListDatasets():
if ds in fc:
return ds
I slightly modified it to use an EnvManager so you don't leave your environment in a dirty state.
When timing these, the Path solution is a little bit faster:
>>> %timeit get_fc_dataset(path)
35 ms ± 836 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit get_fc_dataset_list(path)
49.4 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
But if you wrap the functions in a functools.lru_cache decorator, they both perform incredibly fast:
from __future__ import annotations
from pathlib import Path
from arcpy import Describe, ListDatasets, EnvManager
from functools import lru_cache
# For Describe type hinting
try:
from arcpy.typing.describe import FeatureClass
except ImportError: # will fail at runtime do to malformed package
pass
@lru_cache
def get_fc_dataset(fc: Path) -> str:
fc = Path(fc)
fc_desc: FeatureClass = Describe(str(fc))
fc_dataset = fc.parent
wsp_path = Path(fc_desc.workspace.catalogPath)
if fc_dataset != wsp_path:
return str(fc_dataset.relative_to(wsp_path))
@lru_cache
def get_fc_dataset_list(fc: str) -> str:
fc_desc: FeatureClass = Describe(fc)
with EnvManager(workspace=fc_desc.workspace.catalogPath):
for ds in ListDatasets():
if ds in fc:
return ds
>>> %timeit get_fc_dataset(path)
79.9 ns ± 0.512 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit get_fc_dataset_list(path)
79.6 ns ± 0.31 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
If you need to access datasets a lot (see, repeatedly calling this in a loop somewhere with the same arguments each time), using the lru_cache is probably a good idea. Just make sure to invalidate it with func.cache_clear() if you add a new dataset or move a feature from one dataset to another.