Select to view content in your preferred language

How to get the dataset name from feature class name using arcpy

2127
12
Jump to solution
02-25-2025 07:35 AM
MaximeDemers
Frequent Contributor

Hi,

I have a list of geodatabase feature class names and I would like to have the dataset names for each if any.

Ideally I would prefer not to loop over each element of the geodatabase to find it. I was hoping arcpy.Describe would contain that information, but no.

 

import os, arcpy

workspace = "path/to/geodatabase.gdb"
fc_list = ["fc_1_name", "fc_2_name", "fc_3_name"]

for fc in fc_list:
  desc = arcpy.Describe(os.path.join(workspace, fc))

  print(desc.catalogPath) #does not includes dataset if any

 

 

Any suggestion would be appreciated!
Thanks

0 Kudos
1 Solution

Accepted Solutions
AlfredBaldenweck
MVP Regular Contributor

There is not a good way.

Please vote on this related Idea here: Feature Classes: Add Property denoting whether it'... - Esri Community

Couple things here, though.

Feature Datasets don't exist. You think they do, we pretend they do, but for most operations you use, you can plug in "C:\test.gdb\ExFD\exFC" OR "C:\test.gd\exFC" and get the same result. Try it on Buffer()  or Exists() if you don't believe me.  Where we run into problems is when we're actually trying to search for stuff, and then Pro thinks that they matter for some reason.

Because of this, the way you're going about this is kind of self-defeating because Describe() will take the path with or without the dataset in there.

 

Best way to do this is, unfortunately, to set the workspace environment and search through there. Something like this (I wrote this from memory so idk if the capitalization is right)

fdDict = {} # {fd1: [fcA, fcB], fd2:[fcC, fcD]}

gdb = r"...\ex.gdb"

arcpy.env.workspace = gdb

for fd in arcpy.ListDatasets("Feature"):
    fdDict[fd] = []
    arcpy.env.workspace = os.path.join(gdb, fd)
    for fc in arcpy.ListFeatureClasses():
        fdDict[fd].append(fc)

 

Please also vote on this Idea to make iterating through these workspaces less awful: arcpy.List[Type] functions: Let us feed it a works... - Esri Community

View solution in original post

12 Replies
RPGIS
by MVP Regular Contributor
MVP Regular Contributor

Hi @MaximeDemers,

 

Have you looked into List Datasets or Walk for that matter.

That should help guide you to identify the list of datasets, but if you are looking to get the list of datasets using the feature class filepath then here is a sample below.

 

import arcpy
Workspace = '<some sde or gdb>'
Walk = arcpy.da.Walk( Workspace , datatype="FeatureDataset" )
for root, dirname, filenames in Walk:
    print( dirname )
    for filename in filenames:
         print( '<something>' )

 

 

HaydenWelch
MVP Regular Contributor

da.Walk is such a powerful tool, but it's irritating how slow it can be. Because it's recursively extracting the structure from the root, it can sometimes take seconds to run when passed a moderately complex workspace.

This isn't a big deal when you are only running it once, but I frequently have tools that need to get their context on initialization, and if you have say 10 tools in a toolbox that all call da.Walk, that can sometimes mean loading the toolbox in will take 10-15 seconds.

Again, not the worst if you absolutely need all that info, but it's a recipe for people thinking things are broken when the toolbox loading in locks the main thread for a long time with no clear reason as to what's happening.

This also happens with the Python Window when it processes auto-complete options for your current cursor position. Because it's trying to find out the short names of the layers you can put in to a function call, it will just hang the main thread until it finds them, then hang it again if you move your cursor.

0 Kudos
Clubdebambos
MVP Regular Contributor

Hi @MaximeDemers,

Multiple ways to achieve this and you can still go down the Describe route.

 

import arcpy

workspace = "path/to/geodatabase.gdb"
fc_list = ["fc_1_name", "fc_2_name", "fc_3_name"]

## describe the gdb
desc = arcpy.da.Describe(workspace)

## use dictionary comprehension
## if fc name is forund in a Feature Dataset and entry is made in the dictionary
## FC_NAME (key) : FD_NAME (value)
fc_dict = {fc["name"]: fd["name"] for fd in desc["children"] if fd["dataType"] == "FeatureDataset" for fc in fd["children"] if fc["name"] in fc_list}

print(fc_dict)

 

 

Does this help?

Cheers,

Glen

 

~ learn.finaldraftmapping.com
AlfredBaldenweck
MVP Regular Contributor

There is not a good way.

Please vote on this related Idea here: Feature Classes: Add Property denoting whether it'... - Esri Community

Couple things here, though.

Feature Datasets don't exist. You think they do, we pretend they do, but for most operations you use, you can plug in "C:\test.gdb\ExFD\exFC" OR "C:\test.gd\exFC" and get the same result. Try it on Buffer()  or Exists() if you don't believe me.  Where we run into problems is when we're actually trying to search for stuff, and then Pro thinks that they matter for some reason.

Because of this, the way you're going about this is kind of self-defeating because Describe() will take the path with or without the dataset in there.

 

Best way to do this is, unfortunately, to set the workspace environment and search through there. Something like this (I wrote this from memory so idk if the capitalization is right)

fdDict = {} # {fd1: [fcA, fcB], fd2:[fcC, fcD]}

gdb = r"...\ex.gdb"

arcpy.env.workspace = gdb

for fd in arcpy.ListDatasets("Feature"):
    fdDict[fd] = []
    arcpy.env.workspace = os.path.join(gdb, fd)
    for fc in arcpy.ListFeatureClasses():
        fdDict[fd].append(fc)

 

Please also vote on this Idea to make iterating through these workspaces less awful: arcpy.List[Type] functions: Let us feed it a works... - Esri Community

MaximeDemers
Frequent Contributor

Thank you for your answer!

If I have no choice but to loop over datasets, this is how I do:

import os, arcpy

workspace = "path/to/geodabase.gdb"
arcpy.env.workspace = workspace 
datasets = arcpy.ListDatasets()
fc_list = ["fc_1_name", "fc_2_name", "fc_3_name"]
for fc in fc_list:
  dataset = next((dataset for dataset in datasets if arcpy.Exists(os.path.join(workspace, dataset, fc)), None)
  print(dataset, fc)
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Walk - ArcGIS Pro | Documentation is more functional, idiomatic, and performant then relying on the older ArcPy List functions.

HaydenWelch
MVP Regular Contributor

When you need to walk a whole directory, it can be faster. However it seems that if all you already know what you want, the List functions can be faster:

 

Here's a test where I'm pulling all the feature class names from the root of the workspace:

from arcpy.da import Walk
from arcpy import EnvManager, ListFeatureClasses

def test_list(workspace: str):
    fcs = []
    with EnvManager(workspace=workspace):
        fcs.extend(ListFeatureClasses())
    return fcs

def test_walk(workspace: str):
    fcs = []
    for root, dirs, files in Walk(workspace, datatype='FeatureClass'):
        fcs.extend([f for f in files])
    return fcs
>>> %timeit test_list(wsp)
18.7 ms ± 80.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit test_walk(wsp)
112 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

 

When you have to do a recursive dataset traversal, Walk edges out the List workflow:

def test_list(workspace: str):
    fcs = []
    with EnvManager(workspace=workspace):
        fcs.extend(ListFeatureClasses())
        for ds in ListDatasets():
            fcs.extend(ListFeatureClasses(feature_dataset=ds))
    return fcs

def test_walk(workspace: str):
    fcs = []
    for root, dirs, files in Walk(workspace, datatype='FeatureClass'):
        fcs.extend([f for f in files])
    return fcs
>>> %timeit test_list(wsp)
184 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit test_walk(wsp)
111 ms ± 814 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

 

Basically, the List functions are going to change in execution time depending on how you end up using them, while Walk will be pretty consistent. 1/10th of a second for a GDB with ~75 Feature Classes and 4 FeatureDatasets.

 

I do agree with it being more idiomatic, though it can be a bit confusing, for example if you want to pull datasets from a project workspace with multiple GDBs, you need to do this:

for root, dirs, files in Walk(self.path, datatype=datatype):
            if 'Dataset' in datatype:
                # add some filtering so only dirs after a .gdb are added
                # without this all directories are considered datasets
                paths.extend([Path(root) / d for d in dirs if root.endswith('.gdb')])
            else:
                paths.extend([Path(root) / f for f in files])

 

Because the 'FeatureDataset' datatype will return all empty directories in the root recursively. So you need to filter on the root and make sure it's in a gdb or you'll get all the folders...

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I tested your code on several file and mobile geodatabases on my machine, and the Walk code was almost an order of magnitude faster for all cases.  I am not sure why it is slower in your tests on your machine.  Can you run the tests not using a Notebook.

0 Kudos
HaydenWelch
MVP Regular Contributor

You can find the dataset using some Path magic too:

from __future__ import annotations
from pathlib import Path
from arcpy import Describe

# For Describe type hinting
try:
    from arcpy.typing.describe import FeatureClass
except ImportError: # will fail at runtime do to malformed package
    pass

def get_fc_dataset(fc: Path) -> str:
    fc = Path(fc)
    fc_desc: FeatureClass = Describe(str(fc))
    fc_dataset = fc.parent
    wsp_path = Path(fc_desc.workspace.catalogPath)
    if fc_dataset != wsp_path:
        return str(fc_dataset.relative_to(wsp_path))

Because the workspace of a feature class doesn't include the Dataset it belongs to, you can get the relative component of the featureclass parent and the workspace. Basically all this does is remove the FC name from the path and then check for the part of that parent path that isn't in the workspace.

This function currently returns the name of the dataset, but you could also have it return the full path by just returning the fc_dataset.

Here's Alfred's code written as a function with the same return result as this one:

from arcpy import Describe, ListDatasets, EnvManager

def get_fc_dataset_list(fc: str) -> str:
    fc_desc: FeatureClass = Describe(fc)
    with EnvManager(workspace=fc_desc.workspace.catalogPath):
        for ds in ListDatasets():
            if ds in fc:
                return ds

I slightly modified it to use an EnvManager so you don't leave your environment in a dirty state.

 

When timing these, the Path solution is a little bit faster:

>>> %timeit get_fc_dataset(path)
35 ms ± 836 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit get_fc_dataset_list(path)
49.4 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

 

But if you wrap the functions in a functools.lru_cache decorator, they both perform incredibly fast:

from __future__ import annotations
from pathlib import Path
from arcpy import Describe, ListDatasets, EnvManager
from functools import lru_cache

# For Describe type hinting
try:
    from arcpy.typing.describe import FeatureClass
except ImportError: # will fail at runtime do to malformed package
    pass

@lru_cache
def get_fc_dataset(fc: Path) -> str:
    fc = Path(fc)
    fc_desc: FeatureClass = Describe(str(fc))
    fc_dataset = fc.parent
    wsp_path = Path(fc_desc.workspace.catalogPath)
    if fc_dataset != wsp_path:
        return str(fc_dataset.relative_to(wsp_path))

@lru_cache
def get_fc_dataset_list(fc: str) -> str:
    fc_desc: FeatureClass = Describe(fc)
    with EnvManager(workspace=fc_desc.workspace.catalogPath):
        for ds in ListDatasets():
            if ds in fc:
                return ds
    

 

>>> %timeit get_fc_dataset(path)
79.9 ns ± 0.512 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

>>> %timeit get_fc_dataset_list(path)
79.6 ns ± 0.31 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

 

If you need to access datasets a lot (see, repeatedly calling this in a loop somewhere with the same arguments each time), using the lru_cache is probably a good idea. Just make sure to invalidate it with func.cache_clear() if you add a new dataset or move a feature from one dataset to another.

0 Kudos