What You Won't See on a Walk: In-memory Data Sources.

2318
2
11-14-2015 05:14 PM
Labels (1)
JoshuaBixby
MVP Esteemed Contributor
3 2 2,318

I was on the fence whether to write a blog post or simply open a discussion in the Python place.  It has been a while since I contributed to my blog, so I am going with the former.

 

The ArcPy Data Access Walk function (arcpy.da.Walk) is a real workhorse function, and I rely on it quite a bit.  The ArcPy Walk function is an example where Esri got it right with their Python implementation.  I can't say what their design goal was, but it seems rather apparent the aim was to create a geospatially-aware version of Python's os.walk function.  The names are the same, many of the parameters are the same, the results are similar, etc....  I think emulating this long-standing, native Python functionality in ArcPy was a win because it didn't reinvent a perfectly good wheel, i.e., users familiar with os.walk can easily transition to using arcpy.da.Walk.

 

As much as I like the ArcPy Walk function, there are a couple of minor issues I have with it.  The first is a documentation issue, or should I say lack of documentation.  One of the Walk function's parameters is datatype.  The documentation has a table that lists all of the acceptable arguments for datatype, but there is no actual documentation of the data types themselves.  Looking over the data type names, it seems most of them are obvious, so maybe Esri decided they didn't need to document them.

 

As descriptive as a name might seem at first glance, lack of documentation usually leads to ambiguity and confusion.  For example, what is covered under "FeatureClass"?  Obviously one would assume a feature class in a geodatabase is covered but what about shape files?  Are shape files feature classes?  In Esri-land, the answer is usually "Yes" but not always.  The real riddle is the "Geo" datatype since it includes shape files but not feature classes in a geodatabase.  Feature classes aren't "geo?"  As important as documentation is for libraries/APIs, it isn't the main reason for writing today.

 

One of my favorite ArcPy Walk patterns is to call Python's built-in next() function once to return a list of shape files in a folder or feature classes in a geodatabase or feature classes in a feature dataset.

>>> workspace = #path to folder or geodatabase or feature dataset
>>>
>>> _, _, filenames = next(arcpy.da.Walk(workspace, datatype="FeatureClass"))
>>> filenames
[u'canadwshed_p.shp', u'plots.shp']
>>>
>>> #or looping over feature classes or shape files
>>> for file in next(arcpy.da.Walk(workspace, datatype="FeatureClass"))[2]:
...     print file
canadwshed_p.shp
plots.shp
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 

In the first example, I create a throw-away Workspace Walker object that I don't have to bother keeping around or deconstructing.  In the second example, calling next() in the for loop allows me to cut out an extra loop when I am only interested in one level of geospatial data.

 

As is the case with most things in life, there isn't just one way to list feature classes in a geodatabase or shape files in a folder.  In fact, the ArcPy Walk function is the new kid on the block being introduced in ArcGIS 10.1 SP1.  Prior to ArcPy Walk, a user could use one of the many ArcPy listing functions (ListDatasets, ListFeatureClasses, ListFiles, ListRasters, ListTables, and ListWorkspaces) or the ArcPy Describe function.  I tend to prefer ArcPy Walk over the others because of its ease of use and similarity to built-in Python functionality.

 

The main reason for this blog post is to share one area where the ArcPy Walk function stumbles, i.e., in-memory data sources.  Whereas the older methods of listing data sources work with in-memory workspaces, that is not the case with ArcPy Walk.

>>> arcpy.CreateFeatureclass_management('in_memory', 'test_fc')
<Result 'in_memory\\test_fc'>
>>> arcpy.CreateTable_management('in_memory', 'test_tbl')
<Result 'in_memory\\test_tbl'>
>>> arcpy.CreateRasterDataset_management('in_memory','test_rd')
<Result 'in_memory\\test_rd'>
>>> 
>>> #using ArcPy Walk
>>> next(arcpy.da.Walk('in_memory'))[2]
[]
>>>
>>> #using ArcPy listing functions
>>> arcpy.env.workspace = 'in_memory'
>>> arcpy.ListFeatureClasses()
[u'test_fc']
>>> arcpy.ListTables()
[u'test_tbl']
>>> arcpy.ListRasters()
[u'test_rd']
>>> arcpy.ListDatasets()
[u'test_rd']
>>> 
>>> #using ArcPy Describe function
>>> [child.name for child in arcpy.Describe('in_memory').children]
[u'test_tbl', u'test_fc', u'test_rd']
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 

Not supporting in-memory workspaces isn't much more than a stumble, but it is a stumble nonetheless.  After all, the documentation does say the first parameter is the "top-level workspace" and yet no mention is made that in-memory workspaces aren't supported.  Fortunately for users, there are at least two other ways to list in-memory data sources.

 

UPDATE 06/2017:

 

Since writing this blog post nearly 18 months ago (I know "time flies" but still, 18 months already?), I have come to discover the issue with ArcPy Walk and in-memory workspaces is more nuanced than I originally thought.  Let me demonstrate:

>>> arcpy.CreateFeatureclass_management('in_memory', 'test_fc')
<Result 'in_memory\\test_fc'>
>>> arcpy.CreateTable_management('in_memory', 'test_tbl')
<Result 'in_memory\\test_tbl'>
>>> arcpy.CreateRasterDataset_management('in_memory','test_rd')
<Result 'in_memory\\test_rd'>
>>> 
>>> # using ArcPy Walk with "GPInMemoryWorkspace" rather than common "in_memory"
>>> next(arcpy.da.Walk('GPInMemoryWorkspace'))[2]
[u'test_tbl', u'test_fc', u'test_rd']
>>> ‍‍‍‍‍‍‍‍‍‍‍

 

So, it turns out that ArcPy Walk works just fine with in-memory workspaces, when it actually knows you are pointing it to an in-memory workspace.  The really frustrating part of this, and even a bit lamentable, is that this is simply about semantics and Esri still can't manage to fix it.  Functionally, ArcPy Walk already works with in-memory workspaces, the function just doesn't know that everyone else and every other tool refers to those spaces as "in_memory" instead of "GPInMemoryWorkspace".

2 Comments
About the Author
I am currently a Geospatial Systems Engineer within the Geospatial Branch of the Forest Service's Chief Information Office (CIO). The Geospatial Branch of the CIO is responsible for managing the geospatial platform (ArcGIS Desktop, ArcGIS Enterprise, ArcGIS Online) for thousands of users across the Forest Service. My position is hosted on the Superior National Forest. The Superior NF comprises 3 million acres in northeastern MN and includes the million-acre Boundary Waters Canoe Area Wilderness (BWCAW).