arcpy.List[Type] functions: Let us feed it a workspace instead of having to switch the environment

AlfredBaldenweck

The workspace environment must be set before using several of the list functions, including ListDatasets, ListFeatureClasses, ListFiles, ListRasters, ListTables, and ListWorkspaces.

It is super, super annoying to use the arcpy.ListFeatureClasses, ...Tables, etc. functions right now because you can't specify a workspace you're checking.

Instead, you have to set the environment to that workspace, first.

arcpy.env.workspace = "whatever.gdb"
for tab in arcpy.ListTables():
    print(tab.name)

I would love, love, LOVE if we didn't have to do all that and could just feed it the workspace directly.

Right now, the call for List FeatureClasses, (for example) is as follows :

ListFeatureClasses ({wild_card}, {feature_type}, {feature_dataset})

Could we please change it to

ListFeatureClasses ({wild_card}, 
                    {feature_type}, 
                    {feature_dataset}, 
                    {workspace = arcpy.env.workspace}
)

That is, make it go to the environment workspace by default, unless you specify a different workspace?

I just want to be able to get a list of all the stuff for each gdb in a list without having to change the environment all the time.

ShaunWalbridge · ‎08-29-2024

Thanks for the idea!

From what I understand you're trying to do, the existing alternative approaches would be to use envManager to operate multiple calls against a specific GDB:

arcpy.env.workspace = "whatever.gdb"
# find tables only in 'other.gdb':
with arcpy.EnvManager("workspace": "other.gdb"):
    for table in arcpy.ListTables():
        print(table.name)

# reset to the global state, now get 'whatever.gdb' results
for table in arcpy.ListTables():
    print(table.name)

The other approach you could take if you have a large collection of geodatabases to search is to use arcpy.da.Walk instead. In that model, you'd point to some place higher up in your heirarchy that includes multiple geodatabases, then filter them as you go through the walk iterations:

import arcpy
import os

workspace = r"C:\data"
feature_classes = {}

# can change datatype as needed to filter separate groups
walk = arcpy.da.Walk(workspace, datatype="FeatureClass")

for dirpath, dirnames, filenames in walk:
    for filename in filenames:
        if dirnames:
            last_dirname = dirnames.split(os.sep)[-1]
            if last_dirname.endswith('.gdb'):
                if last_dirname not in feature_clases:
                    feature_classes[last_dirname] = []

                feature_classes[last_dirname].append(filename)

Given that there are a few well supported existing paths for this, I think it would be better to try and adopt those patterns rather than the functions changing their relationship with workspaces.

BlakeTerhune · ‎08-29-2024

I would support the idea proposed by @AlfredBaldenweck.

I think the idea is not to propose functionality that isn't possible, but rather to have the list functions follow a similar pattern to most other functions and have a parameter to explicitly set the workspace. It feels odd that you need to use arcpy.env.workspace (primarily) only when dealing with a list function, but not for most other things.

AlfredBaldenweck · ‎08-29-2024

Yeah, Blake has the right idea.

For example, ListFields takes the table that we're querying as a parameter. Why don't the other list functions do that?

I'm very sure that it's possible to do like

import arcpy
fgdbList = ["gdb1.gdb", "gdb2.gdb"]
for fgdb in fdbList:
    with arcpy.EnvManager(workspace=fgdb):
        feature_classes = arcpy.ListFeatureClasses(feature_type='POLYGON')

But why can't we just do the much more intuitive:

import arcpy
fgdbList = ["gdb1.gdb", "gdb2.gdb"]
for fgdb in fdbList:
    feature_classes = arcpy.ListFeatureClasses(feature_type='POLYGON', 
                                               workspace = fgdb
                                              )

DavidSolari · ‎08-29-2024

Following the usual precedence for "old function but good" these would be Data Access functions, say:

my_workspace = r"C:\path\to\data.gdb"
my_dataset = "ImportantItems"

# Get all feature classes as some sort of proper object, including any dataset objects
all_fcs = [f.name for f in arcpy.da.ListFeatureClasses(my_workspace)]

# Now just the dataset
dataset_fcs = [f.featureType for f in arcpy.da.ListFeatureClasses(f"{my_workspace}\\{my_dataset}")]
alternatively = [f.featureType for f in arcpy.da.ListFeatureClasses(my_workspace, feature_datasets=[my_dataset])]

# Why call many functions when you can call one?
everything = arcpy.da.ListObjects(my_workspace)

# But some database-side filtering is good for large workspaces
special_photos = arcpy.da.ListRasters(my_workspace, "county_*")
special_lines = arcpy.da.ListFeatureClasses(my_workspace, "river_*", "POLYLINE")

You get the idea, lots of room for more feature-rich, less stateful functions that play better with all those new typestubs.

AlfredBaldenweck · ‎09-03-2024

Hey @ShaunWalbridge , could we open this back up?

HannesZiegler · ‎09-24-2024

Setting status for this idea back to open

AlfredBaldenweck · ‎01-10-2025

Another thing I noticed today: ListDomains—ArcGIS Pro | Documentation takes the workspace as a parameter. This is inconsistent with the current behaviour of the other list functions (although still desired!)

HaydenWelch · ‎02-18-2025

Been thinking about this and I think some basic "Manager" classes could make this a lot less tedious:

>>> workspace = WorkspaceManager(r"C:\path\to\workspace.gdb")
>>> print(workspace.name)
workspace.gdb

>>> print(workspace.feature_classes)
[Path('C:\path\to\workspace.gdb\feature_class1'), Path('C:\path\to\workspace.gdb\feature_class2')]

>>> for fc in workspace.feature_classes.as_strings():
...     print(fc)
C:\path\to\workspace.gdb\feature_class1
C:\path\to\workspace.gdb\feature_class2

>>> arcpy.management.CopyFeatures(workspace.feature_classes.as_strings()[0], str(workspace.path / 'copy_of_feature_class1'))
>>> workspace.feature_classes
[Path('C:\path\to\workspace.gdb\feature_class1'), Path('C:\path\to\workspace.gdb\feature_class2')]

>>> workspace.reload(['feature_classes'])
>>> workspace.feature_classes
[Path('C:\path\to\workspace.gdb\feature_class1'), Path('C:\path\to\workspace.gdb\feature_class2'), Path('C:\path\to\workspace.gdb\copy_of_feature_class1')]

Here's a basic implementation of WorkspaceManager with a bonus name index-able PathList container

Spoiler

from __future__ import annotations

from collections import UserList
from typing import overload
from pathlib import Path

from arcpy import (
    EnvManager,
    ListDatasets,
    ListRasters,
    ListTables,
    ListWorkspaces,
    ListFeatureClasses,
    ListFiles,
)

class PathList(UserList[Path]):
    """A list of features that can be accessed by index or name"""
    
    @overload
    def __getitem__(self, index: int) -> Path: ...
    @overload
    def __getitem__(self, name: str) -> Path: ...
    def __getitem__(self, ident: int | str) -> Path:
        if isinstance(ident, int):
            return self.data[ident]
        elif isinstance(ident, str):
            for item in self.data:
                if item.name == ident:
                    return item
            raise KeyError(f"Item {ident} not found")
    
    def as_strings(self) -> list[str]:
        """Returns the list of paths as strings"""
        return [str(item) for item in self.data]
    
class WorkspaceManager:
    # Caches are used to limit the amount of times
    # the List* functions are called, they tend to
    # be slow and if you run them in a loop it will
    # be very slow (like 1-2 seconds per feature_classes)
    __caches__ = (
        '_datasets', 
        '_rasters',
        '_tables', 
        '_workspaces', 
        '_feature_classes', 
        '_files',
    )
    
    __slots__ = ('path', 'name', *__caches__)
    
    def __init__(self, path: Path):
        self.path = Path(path)
        self.name = self.path.name
        
        for cache in self.__caches__:
            setattr(self, cache, None)
    
    def _retrieve(self, func: callable, cache: str):
        if getattr(self, cache):
            return getattr(self, cache)
        
        with EnvManager(workspace=str(self.path)):
            items = func()
        setattr(self, cache, PathList(Path(self.path / item) for item in items))
        
        return getattr(self, cache)
    
    @property
    def datasets(self) -> PathList[Path]:
        return self._retrieve(ListDatasets, '_datasets')
    
    @property
    def rasters(self) -> PathList[Path]:
        return self._retrieve(ListRasters, '_rasters')
    
    @property
    def tables(self) -> PathList[Path]:
        return self._retrieve(ListTables, '_tables')
    
    @property
    def workspaces(self) -> PathList[Path]:
        return self._retrieve(ListWorkspaces, '_workspaces')
    
    @property
    def feature_classes(self) -> PathList[Path]:
        
        # Early return on cache hit
        if self._feature_classes:
            return self._feature_classes

        # retrieve datasets and root feature classes
        self._retrieve(ListDatasets, '_datasets')
        self._retrieve(ListFeatureClasses, '_feature_classes')
        
        # Return root feature classes if no datasets
        if not self.datasets:
            return self._feature_classes
        
        # Extend feature classes with datasets
        for ds in self.datasets:
            with EnvManager(workspace=str(ds)):
                self._feature_classes.extend(self.path / fc for fc in ListFeatureClasses())
        return self._feature_classes
        
    @property
    def files(self) -> PathList[Path]:
        return self._retrieve(ListFiles, '_files')
        
    def reload(self, caches: list[str] = None):
        """Reloads the caches for the workspace
        
        Args:
            caches: A list of caches to reload. If None, all caches are reloaded
        """
        if not caches:
            caches = self.__caches__
        
        for cache in caches:
            if not cache.startswith('_'):
                cache = f"_{cache}"
                
            if cache in self.__caches__:
                setattr(self, cache, None)

from __future__ import annotations from collections import UserList from typing import overload from pathlib import Path from arcpy import ( EnvManager, ListDatasets, ListRasters, ListTables, ListWorkspaces, ListFeatureClasses, ListFiles, ) class PathList(UserList[Path]): """A list of features that can be accessed by index or name""" @overload def __getitem__(self, index: int) -> Path: ... @overload def __getitem__(self, name: str) -> Path: ... def __getitem__(self, ident: int | str) -> Path: if isinstance(ident, int): return self.data[ident] elif isinstance(ident, str): for item in self.data: if item.name == ident: return item raise KeyError(f"Item {ident} not found") def as_strings(self) -> list[str]: """Returns the list of paths as strings""" return [str(item) for item in self.data] class WorkspaceManager: # Caches are used to limit the amount of times # the List* functions are called, they tend to # be slow and if you run them in a loop it will # be very slow (like 1-2 seconds per feature_classes) __caches__ = ( '_datasets', '_rasters', '_tables', '_workspaces', '_feature_classes', '_files', ) __slots__ = ('path', 'name', *__caches__) def __init__(self, path: Path): self.path = Path(path) self.name = self.path.name for cache in self.__caches__: setattr(self, cache, None) def _retrieve(self, func: callable, cache: str): if getattr(self, cache): return getattr(self, cache) with EnvManager(workspace=str(self.path)): items = func() setattr(self, cache, PathList(Path(self.path / item) for item in items)) return getattr(self, cache) @property def datasets(self) -> PathList[Path]: return self._retrieve(ListDatasets, '_datasets') @property def rasters(self) -> PathList[Path]: return self._retrieve(ListRasters, '_rasters') @property def tables(self) -> PathList[Path]: return self._retrieve(ListTables, '_tables') @property def workspaces(self) -> PathList[Path]: return self._retrieve(ListWorkspaces, '_workspaces') @property def feature_classes(self) -> PathList[Path]: # Early return on cache hit if self._feature_classes: return self._feature_classes # retrieve datasets and root feature classes self._retrieve(ListDatasets, '_datasets') self._retrieve(ListFeatureClasses, '_feature_classes') # Return root feature classes if no datasets if not self.datasets: return self._feature_classes # Extend feature classes with datasets for ds in self.datasets: with EnvManager(workspace=str(ds)): self._feature_classes.extend(self.path / fc for fc in ListFeatureClasses()) return self._feature_classes @property def files(self) -> PathList[Path]: return self._retrieve(ListFiles, '_files') def reload(self, caches: list[str] = None): """Reloads the caches for the workspace Args: caches: A list of caches to reload. If None, all caches are reloaded """ if not caches: caches = self.__caches__ for cache in caches: if not cache.startswith('_'): cache = f"_{cache}" if cache in self.__caches__: setattr(self, cache, None)

MaximeDemers · ‎02-25-2025

I agree with this idea. To me, the problem is even larger because the whole python API is full of inconsistencies between methods or object property names.

Even the the new cim is awfully conceived.