Select to view content in your preferred language

Getting feature datasets with arcpy.da.Walk (ArcGIS Pro 3.1.0)

1235
8
06-04-2023 10:18 AM
SzymAdamowski
New Contributor III

I'm trying to find feature datasets in file geodatabase (possibly many geodatabases, but for clarity here only one) using arcpy.da.Walk. I'm aware of workaround using ListDatasets and I'm aware of similiar topic (https://community.esri.com/t5/python-questions/arcpy-da-walk-workspace-datatype-quot/m-p/595010), but this one is from 2015. We have 2023, ArcGIS Pro 3.1 and it looks like it is still the same - arcpy.da.Walk doesn't detect feature datasets as feature datasets (it treats them as "folders")- or am I doing something wrong?

 

 

import os
workspace=r'E:\GIS_DATA\ArcGIS\TestWalk\TestWalk.gdb'
feature_datasets = []

walk = arcpy.da.Walk(workspace,datatype=["FeatureDataset"])

for dirpath, dirnames, filenames in walk:
    print (dirpath,dirnames,filenames)
    for filename in filenames:
        feature_datasets.append(os.path.join(dirpath, filename))
print ("Found:{0}".format(feature_datasets))

 

 

This is my geodatabase structue:

SzymAdamowski_0-1685898703834.png

This is the result of the script:

E:\GIS_DATA\ArcGIS\TestWalk\TestWalk.gdb ['DS1', 'DS2'] []
E:\GIS_DATA\ArcGIS\TestWalk\TestWalk.gdb\DS1 [] []
E:\GIS_DATA\ArcGIS\TestWalk\TestWalk.gdb\DS2 [] []
Found:[]

 

Tags (3)
0 Kudos
8 Replies
by Anonymous User
Not applicable

Your code is looking for filenames instead of the dataset (directory).  Since there are no files in there, the filenames will be empty [].  If you want it to list the datasets, you should iterate over the dirnames.

    for dirname in dirnames:
        feature_datasets.append(os.path.join(dirpath, dirname))

print ("Found:{0}".format(feature_datasets))

 

0 Kudos
SzymAdamowski
New Contributor III

Did you check your code with datasets? Did it return only datasets? In my case it returns all windows folders + datasets inside GDB.

Modified code as suggested by JeffK:

 

import os

workspace=r'E:\GIS_DATA\ArcGIS\TestWalk'

feature_datasets = []

walk = arcpy.da.Walk(workspace,datatype=["FeatureDataset"])

for dirpath, dirnames, filenames in walk:
    for dirname in dirnames:
        feature_datasets.append(os.path.join(dirpath, dirname))
print ("Found:{0}".format(feature_datasets))

 

Result:

Found:['E:\\GIS_DATA\\ArcGIS\\TestWalk\\Index', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\TestWalk.gdb', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\ImportLog', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\GpMessages', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\.ipynb_checkpoints', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\.backups', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\Temp', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\Index\\TestWalk', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\Index\\Thumbnail', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\TestWalk.gdb\\DS1', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\TestWalk.gdb\\DS2']

 

0 Kudos
by Anonymous User
Not applicable

Yes I did, and I only had a dataset in the gdb and nothing else. Point was it should be dirnames and not filenames that you were iterating over and glad you found a workable solution. Walk doesn't seem all together quite yet does it?

0 Kudos
AlfredBaldenweck
MVP Regular Contributor

To discriminate between feature datasets and windows folders, 

if arcpy.Describe(dirname).dataType == "FeatureDataset":
SzymAdamowski
New Contributor III

That will work as well and code looks simpler, however line

if os.path.splitext(dirpath.lower())[-1] in (".gdb", ".sde")

is slightly more efficient as it will check data at higher level (parent folder rather then all child folders)

0 Kudos
SzymAdamowski
New Contributor III

So in your case it returned only Datasets even when you started Walk from Windows Folder workspace that contains different folders?  In my case  it includes Windows folders, GDBs and (last but not least) Datasets.

0 Kudos
by Anonymous User
Not applicable

No, I pointed Walk directly at the gdb because that is where FeatureDatasets are found and is easier to test/ demonstrate the dirname filename issue.

0 Kudos
SzymAdamowski
New Contributor III

From cross-posted Gis Stack Exchange (https://gis.stackexchange.com/questions/461082/getting-feature-datasets-with-arcpy-da-walk), using explanations by user2856 who greatly contribued to finding issues with my approach:

The suggestion to use directory names is technically correct.

arcpy.da.Walk considers a feature dataset to be a container (like a directory or a toolbox), and includes that in the 2nd output dirnames:

dirnames is a list of names of subdirectories and other workspaces in dirpath.

These other workspaces can include feature datasets and toolboxes inside a geodatabase. When you restrict by specifying datatype="FeatureDataset" you get a list of subdirectories and feature datasets.

Perhaps including subdirectories is a bug or perhaps it's intended.

Either way, you are not going to get a list of feature datasets in Walk's 3rd output filenames:

filenames is a list of names of nonworkspace contents in dirpath.

However, you will need to either test all directories to see if they're feature datasets (slow) or simply check if the parent is a file GDB or enterprise GDB connection:

 

workspace=r'D:\Temp\test' #test.gdb with 2 datasets is inside this folder
feature_datasets = []

for root, dirs, files in arcpy.da.Walk(workspace, datatype="FeatureDataset"):
    if os.path.splitext(root.lower())[-1] in (".gdb", ".sde"):
        for dirname in dirs:
            feature_datasets.append(os.path.join(root, dirname))

print(feature_datasets)

 

1. Code and explanation in blue which are from StackExchange user2856 are probably the best (fastest and elegant) approach that currently could be applied with arcpy.da.Walk. However: it should be noted that strictly speaking Feature Dataset is not Workspace (at least not in arcpy.Describe terms). In a perferct world arcpy.da.Walk could be rewritten in such a way that:

filenames is a list of names of nonworkspace contents in dirpath and workspaces if  they're defined in datatype parameter in dirpath.

That would enable easy data harvesting without use of additional conditions

2. Trying to understand logic behing treating Feature Datasets as Workspaces (ok, I get it, it is a "container" holding nested data, so in this meaning it is a Workspace), I added Toolbox in Geodatabase. I was expecting similiar behaviour (after all it is a container holding tools), so I tested following code:

 

import os
workspace=r'E:\GIS_DATA\ArcGIS\TestWalk' #Contains TestWalk.gdb that contains 2 datasets and 1 toolbox


feature_datasets = []

walk = arcpy.da.Walk(workspace,datatype=["Toolbox"])

for dirpath, dirnames, filenames in walk:
    if os.path.splitext(dirpath.lower())[-1] in (".gdb", ".sde"):
        print (dirpath,dirnames,filenames)
        for dirname in dirnames:
            feature_datasets.append(os.path.join(dirpath, dirname))
print ("Found:{0}".format(feature_datasets))

 

To my surprise this code finds Feature Datasets instead of Toolbox:

E:\GIS_DATA\ArcGIS\TestWalk\TestWalk.gdb ['DS1', 'DS2'] []
Found:['E:\\GIS_DATA\\ArcGIS\\TestWalk\\TestWalk.gdb\\DS1', 'E:\\GIS_DATA\\ArcGIS\\TestWalk\\TestWalk.gdb\\DS2']

That looks like obvious bug.

Conclusion: It woud be great if Esri developers could have a look at arcpy.da.Walk and :

a) corrected bugs (eg. datatype="Toolbox" working incorrectly with toolboxes in Geodatabase and some other found in other topics)

b) possibly redesigned to make it use more intuitive (i.e. Feature Datasets could be included aslo in filenames and not only in dirnames if they're specified in datatype parameter) or at least improved documentation so that it would be clear which datatypes are returned in dirnames and which are returned in filenames.


0 Kudos