Using da.Walk to discover feature datasets throughout a series of subfolders

4854
13
09-29-2016 06:31 AM
EricEagle
Occasional Contributor III

Hi, so here's what I'm trying to do.  I have a parent folder, which I'm calling my workspace.  Under that folder are several subfolders, each containing one file geodatabase.  Each file geodatabase contains unique feature datasets that I'm trying to merge into one final file geodatabase.

I'm stuck at building the list of the feature datasets.  Here's the code in question:

folder = "C:\\Temp\\Extracted"

walk = arcpy.da.Walk(folder, datatype="FeatureDataset")
    for dirpath, workspaces, datatypes in walk:
        for datatype in datatypes:
            print(datatype)

This code yields *everything* in the subdirectory structure down to the feature dataset level.  It lists all folder names, all file geodatabases at the .gdb level, and all feature datasets.

What I'm trying to do is get a list back of *only* feature datasets.  Can someone tell me where I'm going wrong?

Tags (2)
0 Kudos
13 Replies
JoshuaBixby
MVP Esteemed Contributor

You have a Python3 print statement.  Are you using ArcPy that comes with ArcGIS Pro?

When I run your code in ArcCatalog, as is but with a different starting directory, it prints nothing. 

0 Kudos
EricEagle
Occasional Contributor III

Joshua,

Yeah, so I'm actually using this in Spyder outside of ArcMap, just calling arcpy as another library so the python3 print statements work fine.

I've updated the code a bit further to this:

folder = "C:\\Temp\\Extracted"
gdbList = []

for paths, subdirs, names in os.walk(folder):
    for subdir in subdirs:
        for name in subdirs:
            if name.endswith(".gdb"):
                fullName = os.path.join(paths, name)
                gdbList.append(fullName)
                print("Appended {} to gdbList".format(fullName))
else:
    fdList = []
    for fgdb in gdbList:
        walk = arcpy.da.Walk(fgdb, datatype="FeatureDataset")
        for fd in walk:
            fdList.append(fd)
        else:
            print(fdList)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The problem now is that the feature dataset list (fdList) contains the full path including the geodatabase in the following format returned in a list of tuples.  The way they are formatted is as follows:

[(u'C:\\Temp\\Extracted\\dir1\\geodatabase01.gdb', [u'Water_Network'], []), (u'C:\\Temp\\Extracted\\dir1\\geodatabase01.gdb\\Water_Network', [], []), (u'C:\\Temp\\Extracted\\dir2\\geodatabase02.gdb', [u'Electric_Network'], []), (u'C:\\Temp\\Extracted\\dir2\\geodatabase02.gdb\\Electric_Network', [], []), (and so on.....)

Again what I'm trying to do is to isolate the feature dataset here and use arcpy copy management to copy it all into a consolidated file geodatabase.  I feel like I'm close but missing some glue here, because I can't iterate the copy_management through the fdList object yet.

(edited for a more thorough example of the output)

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Assuming you only want to copy feature datasets, and not feature classes in the GDBs, what about the following (haven't fully tested😞

folder = # path to starting directory
dst_gdb = # path to destination gdb
walk = arcpy.da.Walk(folder)
for root, dirs, files in walk:
    if root.endswith('.gdb'):
        for dir in dirs:   
            arcpy.Copy_management(os.path.join(root, dir),
                                  os.path.join(dst_gdb, dir))‍‍‍‍‍‍‍‍
0 Kudos
EricEagle
Occasional Contributor III

The problem with this is that each file geodatabase is in its own subdirectory.  So when I use da.Walk and list dirs, it lists OS subdirectories and geodatabases in the results alongside the feature datasets.  Please see my updated code and the updated output I've listed.  At this point I'm just trying to figure out how to do copy management given the tuples I'm being returned.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Since geodatabases can't be nested within each other, and we aren't talking about geodatabase replication, I don't fully understand your reference to "parent geodatabases."  Regardless, if you have something that works for you, and only want to work with the tuples being returned, use os.path.join:

for fd in fdList:
    arcpy.Copy_management(os.path.join(fd[0], fd[1]),
                          os.path.join(destination_gdb, fd[1])‍‍
0 Kudos
EricEagle
Occasional Contributor III

Sorry Joshua, I mis-typed there - I edited my response to make it more accurate.  My issue is that even though I have da.Walk limited to a datatype="FeatureDataset", it doesn't list only feature datasets.  It lists folders, geodatabases, AND feature datasets.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I try to avoid using the datatype parameter of arcpy.da.Walk if possible because the definitions of those data types are poorly documented.  As you describe, "FeatureDataset" returns not only feature datasets but also subfolders within a folder.  With ArcGIS 10.4.x, they finally removed the "Geo" option after I kept asking for a clear definition since the results were strange, to say the least, when using it.

My fuller code snippet should work for you because I am testing whether Walk is already in a file geodatabase before looking at the folders returned, which will be datasets of one kind or another.  If you have multiple kinds of datasets (FeatureDataset, RasterDataset, etc...) and only want a particular kind, the script can be modified to use arcpy.Describe to narrow down the results even further.

EricEagle
Occasional Contributor III

Joshua, sorry for the lateness of this reply but I figured I'd chime in and close the loop for anyone who may stumble on this question later.

The way I fixed this was to basically check for file geodatabases first, feature datasets second, and feature classes last, extracting each to their own list.  It is certainly inelegant but it gives me reliable results.  So it looks a bit like...

gdbList = []

fdList = []

for paths, subdirs, names in os.walk(workingPath):

    for subdir in subdirs:

        if subdir[-4:] == '.gdb':

            gdbName = os.path.join(paths, subdir)

            gdbList.append(gdbName)

for fgdb in gdbList:

    walk = arcpy.da.Walk(fgdb, datatype="FeatureDataset")

        for fd in walk:

            if fd[0][-4:] not in '.gdb':

                fdList.append(fd[0])

Basically I'm building lists through brute-force string checking.

DanPatterson_Retired
MVP Emeritus

FYI to all... the print statement works fine in 2.7 as well as in the 3.x series and has for the last 7 (?) years.  Get used to it, you will need to use 3.x-ish for work in Pro and arcmap in upcoming versions.

0 Kudos