The workspace environment must be set before using several of the list functions, including ListDatasets, ListFeatureClasses, ListFiles, ListRasters, ListTables, and ListWorkspaces.
It is super, super annoying to use the arcpy.ListFeatureClasses, ...Tables, etc. functions right now because you can't specify a workspace you're checking.
Instead, you have to set the environment to that workspace, first.
arcpy.env.workspace = "whatever.gdb"
for tab in arcpy.ListTables():
print(tab.name)
I would love, love, LOVE if we didn't have to do all that and could just feed it the workspace directly.
Right now, the call for List FeatureClasses, (for example) is as follows :
ListFeatureClasses ({wild_card}, {feature_type}, {feature_dataset})
Could we please change it to
ListFeatureClasses ({wild_card},
{feature_type},
{feature_dataset},
{workspace = arcpy.env.workspace}
)
That is, make it go to the environment workspace by default, unless you specify a different workspace?
I just want to be able to get a list of all the stuff for each gdb in a list without having to change the environment all the time.
Thanks for the idea!
From what I understand you're trying to do, the existing alternative approaches would be to use envManager to operate multiple calls against a specific GDB:
arcpy.env.workspace = "whatever.gdb"
# find tables only in 'other.gdb':
with arcpy.EnvManager("workspace": "other.gdb"):
for table in arcpy.ListTables():
print(table.name)
# reset to the global state, now get 'whatever.gdb' results
for table in arcpy.ListTables():
print(table.name)
The other approach you could take if you have a large collection of geodatabases to search is to use arcpy.da.Walk instead. In that model, you'd point to some place higher up in your heirarchy that includes multiple geodatabases, then filter them as you go through the walk iterations:
import arcpy
import os
workspace = r"C:\data"
feature_classes = {}
# can change datatype as needed to filter separate groups
walk = arcpy.da.Walk(workspace, datatype="FeatureClass")
for dirpath, dirnames, filenames in walk:
for filename in filenames:
if dirnames:
last_dirname = dirnames.split(os.sep)[-1]
if last_dirname.endswith('.gdb'):
if last_dirname not in feature_clases:
feature_classes[last_dirname] = []
feature_classes[last_dirname].append(filename)
Given that there are a few well supported existing paths for this, I think it would be better to try and adopt those patterns rather than the functions changing their relationship with workspaces.
I would support the idea proposed by @AlfredBaldenweck.
I think the idea is not to propose functionality that isn't possible, but rather to have the list functions follow a similar pattern to most other functions and have a parameter to explicitly set the workspace. It feels odd that you need to use arcpy.env.workspace (primarily) only when dealing with a list function, but not for most other things.
Yeah, Blake has the right idea.
For example, ListFields takes the table that we're querying as a parameter. Why don't the other list functions do that?
I'm very sure that it's possible to do like
import arcpy
fgdbList = ["gdb1.gdb", "gdb2.gdb"]
for fgdb in fdbList:
with arcpy.EnvManager(workspace=fgdb):
feature_classes = arcpy.ListFeatureClasses(feature_type='POLYGON')
But why can't we just do the much more intuitive:
import arcpy
fgdbList = ["gdb1.gdb", "gdb2.gdb"]
for fgdb in fdbList:
feature_classes = arcpy.ListFeatureClasses(feature_type='POLYGON',
workspace = fgdb
)
Following the usual precedence for "old function but good" these would be Data Access functions, say:
my_workspace = r"C:\path\to\data.gdb"
my_dataset = "ImportantItems"
# Get all feature classes as some sort of proper object, including any dataset objects
all_fcs = [f.name for f in arcpy.da.ListFeatureClasses(my_workspace)]
# Now just the dataset
dataset_fcs = [f.featureType for f in arcpy.da.ListFeatureClasses(f"{my_workspace}\\{my_dataset}")]
alternatively = [f.featureType for f in arcpy.da.ListFeatureClasses(my_workspace, feature_datasets=[my_dataset])]
# Why call many functions when you can call one?
everything = arcpy.da.ListObjects(my_workspace)
# But some database-side filtering is good for large workspaces
special_photos = arcpy.da.ListRasters(my_workspace, "county_*")
special_lines = arcpy.da.ListFeatureClasses(my_workspace, "river_*", "POLYLINE")
You get the idea, lots of room for more feature-rich, less stateful functions that play better with all those new typestubs.
Another thing I noticed today: ListDomains—ArcGIS Pro | Documentation takes the workspace as a parameter. This is inconsistent with the current behaviour of the other list functions (although still desired!)
Been thinking about this and I think some basic "Manager" classes could make this a lot less tedious:
>>> workspace = WorkspaceManager(r"C:\path\to\workspace.gdb")
>>> print(workspace.name)
workspace.gdb
>>> print(workspace.feature_classes)
[Path('C:\path\to\workspace.gdb\feature_class1'), Path('C:\path\to\workspace.gdb\feature_class2')]
>>> for fc in workspace.feature_classes.as_strings():
... print(fc)
C:\path\to\workspace.gdb\feature_class1
C:\path\to\workspace.gdb\feature_class2
>>> arcpy.management.CopyFeatures(workspace.feature_classes.as_strings()[0], str(workspace.path / 'copy_of_feature_class1'))
>>> workspace.feature_classes
[Path('C:\path\to\workspace.gdb\feature_class1'), Path('C:\path\to\workspace.gdb\feature_class2')]
>>> workspace.reload(['feature_classes'])
>>> workspace.feature_classes
[Path('C:\path\to\workspace.gdb\feature_class1'), Path('C:\path\to\workspace.gdb\feature_class2'), Path('C:\path\to\workspace.gdb\copy_of_feature_class1')]
Here's a basic implementation of WorkspaceManager with a bonus name index-able PathList container
from __future__ import annotations
from collections import UserList
from typing import overload
from pathlib import Path
from arcpy import (
EnvManager,
ListDatasets,
ListRasters,
ListTables,
ListWorkspaces,
ListFeatureClasses,
ListFiles,
)
class PathList(UserList[Path]):
"""A list of features that can be accessed by index or name"""
@overload
def __getitem__(self, index: int) -> Path: ...
@overload
def __getitem__(self, name: str) -> Path: ...
def __getitem__(self, ident: int | str) -> Path:
if isinstance(ident, int):
return self.data[ident]
elif isinstance(ident, str):
for item in self.data:
if item.name == ident:
return item
raise KeyError(f"Item {ident} not found")
def as_strings(self) -> list[str]:
"""Returns the list of paths as strings"""
return [str(item) for item in self.data]
class WorkspaceManager:
# Caches are used to limit the amount of times
# the List* functions are called, they tend to
# be slow and if you run them in a loop it will
# be very slow (like 1-2 seconds per feature_classes)
__caches__ = (
'_datasets',
'_rasters',
'_tables',
'_workspaces',
'_feature_classes',
'_files',
)
__slots__ = ('path', 'name', *__caches__)
def __init__(self, path: Path):
self.path = Path(path)
self.name = self.path.name
for cache in self.__caches__:
setattr(self, cache, None)
def _retrieve(self, func: callable, cache: str):
if getattr(self, cache):
return getattr(self, cache)
with EnvManager(workspace=str(self.path)):
items = func()
setattr(self, cache, PathList(Path(self.path / item) for item in items))
return getattr(self, cache)
@property
def datasets(self) -> PathList[Path]:
return self._retrieve(ListDatasets, '_datasets')
@property
def rasters(self) -> PathList[Path]:
return self._retrieve(ListRasters, '_rasters')
@property
def tables(self) -> PathList[Path]:
return self._retrieve(ListTables, '_tables')
@property
def workspaces(self) -> PathList[Path]:
return self._retrieve(ListWorkspaces, '_workspaces')
@property
def feature_classes(self) -> PathList[Path]:
# Early return on cache hit
if self._feature_classes:
return self._feature_classes
# retrieve datasets and root feature classes
self._retrieve(ListDatasets, '_datasets')
self._retrieve(ListFeatureClasses, '_feature_classes')
# Return root feature classes if no datasets
if not self.datasets:
return self._feature_classes
# Extend feature classes with datasets
for ds in self.datasets:
with EnvManager(workspace=str(ds)):
self._feature_classes.extend(self.path / fc for fc in ListFeatureClasses())
return self._feature_classes
@property
def files(self) -> PathList[Path]:
return self._retrieve(ListFiles, '_files')
def reload(self, caches: list[str] = None):
"""Reloads the caches for the workspace
Args:
caches: A list of caches to reload. If None, all caches are reloaded
"""
if not caches:
caches = self.__caches__
for cache in caches:
if not cache.startswith('_'):
cache = f"_{cache}"
if cache in self.__caches__:
setattr(self, cache, None)
I agree with this idea. To me, the problem is even larger because the whole python API is full of inconsistencies between methods or object property names.
Even the the new cim is awfully conceived.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.