A simple drop in class for managing workspaces

366
4
02-19-2025 10:28 AM
Labels (3)
HaydenWelch
MVP Regular Contributor
4 4 366

I ran across an idea by @AlfredBaldenweck the other day in this post that raised some good points about how annoying it can be to work with the arcpy.List* functions. His solution of allowing a workspace parameter to be passed is good, but would require a lot of changes to the existing functions. This led me down a path of building a drop in wrapper class for injecting this functionality into your scripts that behaves more like a regular python object.

The Current Process

As it stands, to use List* functions, you need to either set the global environment workspace or use an EnvManager with the workspace property set:

 

>>> import arcpy
>>> arcpy.env.workspace = r"<workspace_path>"
>>> arcpy.ListFeatureClasses()
['FC1', 'FC2', ...]

# OR

>>> with arcpy.EnvManager(workspace = r"<workspace_path>"):
...     arcpy.ListFeatureClasses()
['FC1', 'FC2', ...]

 

This can become confusing and burdensome if you are having to switch between workspaces a lot. Your code will become peppered with global overrides and leave the environment in a confusing state, or you have a ton of with blocks. These functions also only return the final component of the listed objects meaning you need to use os.path.join() or string concatenation to actually use the returned values once your global environment is changed.

A Simpler Way

Python has a built in pathlib library that contains a Path object that trivializes working with file paths. This is not something that is utilized by arcpy as far as I can tell, but as a developer, being able to use Path objects is a massive headache reducer. Because most workspaces will be some form of filepath (if you aren't using a server connection), these operations can be made much simpler by writing a secondary Manager class that stores the workspace:

 

class WorkspaceManager:
    def __init__(self, path: Path | str) -> None:
        self.path = Path(path)
        self.name = self.path.name
        self._manager = arcpy.EnvManager(workspace=str(self.path))
        self._enter = self._manager.__enter__
        self._exit = self._manager.__exit__

    def __str__(self) -> str:
        return str(self.path)
    
    def __enter__(self):
        self._enter()
    
    def __exit__(self, exctype, excval, tb):
        self._exit(exctype, excval, tb)

 

 So far we haven't done much, but we've learned that we can use an EnvManager as a dependency of our class. This simple example hijacks EnvManager and gives us some additional information about the workspace:

 

>>> w = WorkspaceManager(r"<workspace\path>")
>>> with w:
...     arcpy.ListFeatureClasses()
['FC1', 'FC2', ...]

# This now allows us to build full paths easily

>>> paths = []
>>> with w:
...    for fc in arcpy.ListFeatureClasses():
...        paths.append(w.path / fc)
>>> paths
[WindowsPath('<workspace\path>\FC1'), WindowsPath('<workspace\path>\FC2'), ...]

 

 

Adding Functionality

Just wrapping the EnvManager is a neat trick, but if we're gonna go through all that trouble, we might as well build out some functionality. Ideally we'd want the actions commonly associated with a workspace (List*) to be directly accessible from the WorkspaceManager object:

 

class WorkspaceManager:
    def __init__(self, path: Path | str) -> None:
        self.path = Path(path)
        self.name = self.path.name
        self.manager = arcpy.EnvManager(workspace=str(self.path))
    
    @property
    def tables(self):
        with self.manager:
            return [self.path / fc for fc in arcpy.ListTables()]

 

We've dropped the __enter__ and __exit__ methods and instead started building out properties that use the manager itself. This is a much more ergonomic way to access tables than the first example as it allows us to define multiple workspaces and always know they will have the correct environment set:

 

>>> w1 = WorkspaceManager('<workspace1>')
>>> w2 = WorkspaceManager('<workspace2>')
>>> w1.tables
[WindowsPath('<workspace1>\\FC1'),
 WindowsPath('<workspace1>\\FC2'),
 WindowsPath('<workspace1>\\FC3'),
 ...
]
>>> w2.tables
[WindowsPath('<workspace2>\\FC1'),
 WindowsPath('<workspace2>\\FC2'),
 WindowsPath('<workspace2>\\FC3'),
 ...
]

 

 

Helper Class

Now we're getting somewhere. You may have wondered why I'm using Path instead of just raw path strings. Other than the fact that Path objects are just a treat to work with, they also allow you to easily access the last component with a `name` attribute. This is our List item identifier in our workspace! Since all we're currently getting is a raw list of paths, wouldn't it be nice to have a special list that allowed us to index on these names?

 

class PathList(UserList[Path]):
    """A list of features that can be accessed by index or name"""
    def __init__(self, initlist=None):
        super().__init__(initlist)
        self.__flag = False # mangled flag for returning strings
        
    @overload
    def __getitem__(self, index: int) -> Path: ...
    @overload
    def __getitem__(self, name: str) -> Path: ...
    def __getitem__(self, ident: int | str) -> Path:
        if isinstance(ident, int):
            path = self.data[ident]
            return path if not self.__flag else str(path)
        elif isinstance(ident, str):
            for path in self.data:
                if path.name == ident:
                    return path if not self.__flag else str(path)
            raise KeyError(f"Path {ident} not found")
    
    @contextmanager
    def as_strings(self) -> Generator[PathList, None, None]:
        try:
            self.__flag = True
            yield self
        finally:
            self.__flag = False

 

Here's a simple UserList implementation for holding a list of Path objects. As you can see, we've overridden the __getitem__ magic method (dict['key'] and list[idx] calls this as __getitem__(key)). Because we know our workspace will have unique names, we can now construct a PathList when returning in WorkspaceManager meaning users can pull items by name or index:

 

>>> w1 = WorkspaceManager('<workspace1>')
>>> w1.tables['FC1']
WindowsPath('<workspace1>\\FC1')

 

Because this is a UserList and not a Mapping, we can still treat this container as if it was a regular list. No need to manage keys and values because the values contain their keys in the .name attribute.

 

Putting it All Together

Now that we have our return type, our general interface, and our structure, we can put all this together and write the whole class:

 

from __future__ import annotations

from collections import UserList
from typing import (
    overload, 
    Generator, 
    Literal,
    Callable,
)
from pathlib import Path
from contextlib import contextmanager

from arcpy import (
    EnvManager,
    ListDatasets,
    ListRasters,
    ListTables,
    ListWorkspaces,
    ListFeatureClasses,
    ListFiles,
)

class PathList(UserList[Path]):
    """A list of features that can be accessed by index or name"""
    def __init__(self, initlist=None):
        super().__init__(initlist)
        self.__flag = False # mangled flag for returning strings
        
    @overload
    def __getitem__(self, index: int) -> Path | str: ...
    @overload
    def __getitem__(self, name: str) -> Path | str: ...
    def __getitem__(self, ident: int | str) -> Path | str:
        if isinstance(ident, int):
            path = self.data[ident]
            return path if not self.__flag else str(path)
        elif isinstance(ident, str):
            for path in self.data:
                if path.name == ident:
                    return path if not self.__flag else str(path)
            raise KeyError(f"Path {ident} not found")
    
    @contextmanager
    def as_strings(self) -> Generator[PathList, None, None]:
        if not hasattr(self, '_as_string'):
            setattr(self, '_as_string', False)
        try:
            self.__flag = True
            yield self
        finally:
            self.__flag = False
    
class WorkspaceManager:
    # Caches are used to limit the amount of times
    # the List* functions are called, they tend to
    # be slow and if you run them in a loop it will
    # cause issues (around 1-2 seconds per `feature_classes` call)
    __caches__ = (
        '_datasets', 
        '_rasters',
        '_tables', 
        '_workspaces', 
        '_feature_classes', 
        '_files',
    )
    
    __slots__ = ('path', 'name', 'manager', *__caches__)
    
    def __init__(self, path: Path):
        self.path = Path(path)
        self.name = self.path.name
        self.manager = EnvManager(workspace=str(self.path))
        
        for cache in self.__caches__:
            setattr(self, cache, None)
    
    def _retrieve(self, func: Callable, cache: str) -> PathList:
        # Get cached paths
        if getattr(self, cache):
            return getattr(self, cache)
        
        # Get path using List* func
        with self.manager:
            items = func()
        setattr(self, cache, PathList(self.path / item for item in items))
        
        return getattr(self, cache)
    
    @property
    def datasets(self) -> PathList:
        return self._retrieve(ListDatasets, '_datasets')
    
    @property
    def rasters(self) -> PathList:
        return self._retrieve(ListRasters, '_rasters')
    
    @property
    def tables(self) -> PathList:
        return self._retrieve(ListTables, '_tables')
    
    @property
    def workspaces(self) -> PathList:
        return self._retrieve(ListWorkspaces, '_workspaces')
    
    @property
    def feature_classes(self) -> PathList:
        if self._feature_classes:
            return self._feature_classes

        self._feature_classes = PathList()            
        for wsp in self.datasets + [self.path]:
            with EnvManager(workspace=str(wsp)):
                self._feature_classes.extend(
                    self.path / item for item in ListFeatureClasses()
                )
        return self._feature_classes
    
    @property
    def files(self) -> PathList:
        return self._retrieve(ListFiles, '_files')
        
    def reload(self, caches: list[str]=None):
        """Reloads the caches for the workspace
        
        Args:
            caches: A list of caches to reload. If None, all caches are reloaded
        """
        if not caches:
            caches = self.__caches__
        
        for cache in caches:
            if not cache.startswith('_'):
                cache = f"_{cache}"
                
            if cache in self.__caches__:
                setattr(self, cache, None)

 

Hopefully the additions of cache slots, a reload method, and the _retrieve method don't confuse you too much. They're all implementation details. The cache is to prevent repeated querying of the workspace (~1s for large ones) by caching the paths after the first check. This can be invalidated with .reload() which just deletes the existing cache, meant for calling after you add or delete a workspace object.

_retrieve is just a private method that removes the duplicated with self.manager blocks that were in all properties. With the only exception being feature_classes that has to navigate any possible datasets contained in the parent workspace.

I also added a helpful as_strings() context manager to the PathList class so you can enter a context where the Path objects are implicitly cast to strings for use in other arcpy functions.

 

>>> w = WorkspaceManager(r"<workspace>")
>>> w.feature_classes
[WindowsPath('<workspace\path>\FC1'), 
 WindowsPath('<workspace\path>\FC2'), 
 ...
]
>>> fcs = w.feature_classes
>>> with fcs.as_strings():
...     for pth in fcs:
...         print(p)
...     print(f"Feature 1: {fcs['FC1']}")
...     print(f"Feature 2: {fcs['FC2']}")
'<workspace\path>\FC1'
'<workspace\path>\FC2'
'Feature 1: <workspace\path>\FC1'
'Feature 2: <workspace\path>\FC2'

 

Hopefully you learned some things today about how it's not too difficult to spend a bit of time to extend the functionality of an API without modifying the underlying system. The beauty of Python is that you have all the tools you need to build your own abstractions. If a system you have to interact with is clunky and difficult to use, just make it less so! Try to think about existing APIs and interfaces that you enjoy using and do your best to implement those yourself.

 

EDIT: 2/20/2025: Minor formatting (remove whitespace), remove dead hasattr('_as_strings') check from PathList, remove re-casting self.path / fc as Path

4 Comments
About the Author
Hello! My name is Hayden Welch. I work in the OSP fiber optic design industry and focus heavily on building tools and automations for designing large scale networks using arcpy and ArcGIS Pro