GIS Life Blog - Page 2

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Latest Activity

(463 Posts)
JoshuaBixby
MVP Esteemed Contributor

I was on the fence whether to write a blog post or simply open a discussion in the Python place.  It has been a while since I contributed to my blog, so I am going with the former.

 

The ArcPy Data Access Walk function (arcpy.da.Walk) is a real workhorse function, and I rely on it quite a bit.  The ArcPy Walk function is an example where Esri got it right with their Python implementation.  I can't say what their design goal was, but it seems rather apparent the aim was to create a geospatially-aware version of Python's os.walk function.  The names are the same, many of the parameters are the same, the results are similar, etc....  I think emulating this long-standing, native Python functionality in ArcPy was a win because it didn't reinvent a perfectly good wheel, i.e., users familiar with os.walk can easily transition to using arcpy.da.Walk.

 

As much as I like the ArcPy Walk function, there are a couple of minor issues I have with it.  The first is a documentation issue, or should I say lack of documentation.  One of the Walk function's parameters is datatype.  The documentation has a table that lists all of the acceptable arguments for datatype, but there is no actual documentation of the data types themselves.  Looking over the data type names, it seems most of them are obvious, so maybe Esri decided they didn't need to document them.

 

As descriptive as a name might seem at first glance, lack of documentation usually leads to ambiguity and confusion.  For example, what is covered under "FeatureClass"?  Obviously one would assume a feature class in a geodatabase is covered but what about shape files?  Are shape files feature classes?  In Esri-land, the answer is usually "Yes" but not always.  The real riddle is the "Geo" datatype since it includes shape files but not feature classes in a geodatabase.  Feature classes aren't "geo?"  As important as documentation is for libraries/APIs, it isn't the main reason for writing today.

 

One of my favorite ArcPy Walk patterns is to call Python's built-in next() function once to return a list of shape files in a folder or feature classes in a geodatabase or feature classes in a feature dataset.

>>> workspace = #path to folder or geodatabase or feature dataset
>>>
>>> _, _, filenames = next(arcpy.da.Walk(workspace, datatype="FeatureClass"))
>>> filenames
[u'canadwshed_p.shp', u'plots.shp']
>>>
>>> #or looping over feature classes or shape files
>>> for file in next(arcpy.da.Walk(workspace, datatype="FeatureClass"))[2]:
...     print file
canadwshed_p.shp
plots.shp
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 

In the first example, I create a throw-away Workspace Walker object that I don't have to bother keeping around or deconstructing.  In the second example, calling next() in the for loop allows me to cut out an extra loop when I am only interested in one level of geospatial data.

 

As is the case with most things in life, there isn't just one way to list feature classes in a geodatabase or shape files in a folder.  In fact, the ArcPy Walk function is the new kid on the block being introduced in ArcGIS 10.1 SP1.  Prior to ArcPy Walk, a user could use one of the many ArcPy listing functions (ListDatasets, ListFeatureClasses, ListFiles, ListRasters, ListTables, and ListWorkspaces) or the ArcPy Describe function.  I tend to prefer ArcPy Walk over the others because of its ease of use and similarity to built-in Python functionality.

 

The main reason for this blog post is to share one area where the ArcPy Walk function stumbles, i.e., in-memory data sources.  Whereas the older methods of listing data sources work with in-memory workspaces, that is not the case with ArcPy Walk.

>>> arcpy.CreateFeatureclass_management('in_memory', 'test_fc')
<Result 'in_memory\\test_fc'>
>>> arcpy.CreateTable_management('in_memory', 'test_tbl')
<Result 'in_memory\\test_tbl'>
>>> arcpy.CreateRasterDataset_management('in_memory','test_rd')
<Result 'in_memory\\test_rd'>
>>> 
>>> #using ArcPy Walk
>>> next(arcpy.da.Walk('in_memory'))[2]
[]
>>>
>>> #using ArcPy listing functions
>>> arcpy.env.workspace = 'in_memory'
>>> arcpy.ListFeatureClasses()
[u'test_fc']
>>> arcpy.ListTables()
[u'test_tbl']
>>> arcpy.ListRasters()
[u'test_rd']
>>> arcpy.ListDatasets()
[u'test_rd']
>>> 
>>> #using ArcPy Describe function
>>> [child.name for child in arcpy.Describe('in_memory').children]
[u'test_tbl', u'test_fc', u'test_rd']
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 

Not supporting in-memory workspaces isn't much more than a stumble, but it is a stumble nonetheless.  After all, the documentation does say the first parameter is the "top-level workspace" and yet no mention is made that in-memory workspaces aren't supported.  Fortunately for users, there are at least two other ways to list in-memory data sources.

 

UPDATE 06/2017:

 

Since writing this blog post nearly 18 months ago (I know "time flies" but still, 18 months already?), I have come to discover the issue with ArcPy Walk and in-memory workspaces is more nuanced than I originally thought.  Let me demonstrate:

>>> arcpy.CreateFeatureclass_management('in_memory', 'test_fc')
<Result 'in_memory\\test_fc'>
>>> arcpy.CreateTable_management('in_memory', 'test_tbl')
<Result 'in_memory\\test_tbl'>
>>> arcpy.CreateRasterDataset_management('in_memory','test_rd')
<Result 'in_memory\\test_rd'>
>>> 
>>> # using ArcPy Walk with "GPInMemoryWorkspace" rather than common "in_memory"
>>> next(arcpy.da.Walk('GPInMemoryWorkspace'))[2]
[u'test_tbl', u'test_fc', u'test_rd']
>>> ‍‍‍‍‍‍‍‍‍‍‍

 

So, it turns out that ArcPy Walk works just fine with in-memory workspaces, when it actually knows you are pointing it to an in-memory workspace.  The really frustrating part of this, and even a bit lamentable, is that this is simply about semantics and Esri still can't manage to fix it.  Functionally, ArcPy Walk already works with in-memory workspaces, the function just doesn't know that everyone else and every other tool refers to those spaces as "in_memory" instead of "GPInMemoryWorkspace".

more
3 3 2,881
JoshuaBixby
MVP Esteemed Contributor

I think this blog post follows up nicely to Dan Patterson's recent blog post, ...That empty feeling...  The conversation about null and empty geometries in ArcGIS isn't new, but it does take some getting your head around.

When it comes to creating new feature classes, I am sure most people are quite familiar with the following screen from the New Feature Class wizard:

arccatalog_103_new_feature_class_shape_null.PNG

When creating a new feature class using ArcGIS Desktop, either through the GUI or ArcPy, the default settings allow for NULL values in the SHAPE field (see yellow highlight above).  Regardless of whether NULLs in the SHAPE field are a good or bad idea, they are supported and allowed by default, so it is good to understand what does/can happen when geometries aren't populated with the rest of a record in feature classes.

After troubleshooting several cases where NULL wasn't NULL, I decided to take a deeper look at what really happens when geometries aren't populated in feature classes.  It turns out, the answer depends both on the tool and type of geodatabase being used to insert, store, and retrieve records.  Presented here are the results for some of the more common tools and types of geodatabases.

Table 1 shows what actually gets populated in the SHAPE field when nothing, None, an empty geometry, and a non-empty geometry are inserted into a polygon feature class using three different methods.

TABLE 1:  SHAPE Field Values in Feature Class

Storage

Type

Insert

Method

NOTHINGNONEEMPTYPOLYGON

PGDB

ArcMap Editor

NullN/AN/APolygon
PGDBarcpy.InsertCursorNullErrorEmptyPolygon
PGDBarcpy.da.InsertCursorNullNullNullPolygon
FGDBArcMap EditorEmptyN/AN/APolygon
FGDBarcpy.InsertCursorNullErrorEmptyPolygon
FGDBarcpy.da.InsertCursorNullNullNullPolygon
SDE(SQL)ArcMap EditorEmptyN/AN/APolygon
SDE(SQL)arcpy.InsertCursorNullErrorEmptyPolygon
SDE(SQL)arcpy.da.InsertCursorNullNullNullPolygon
SDE(ORA)ArcMap EditorEmptyN/AN/APolygon
SDE(ORA)arcpy.InsertCursorNullErrorEmptyPolygon
SDE(ORA)arcpy.da.InsertCursorNullNullNullPolygon

NOTE:

  1. Storage Type:
    1. PGDB := personal geodatabase
    2. FGDB := file geodatabase
    3. SDE(SQL) := SQL Server enterprise geodatabase
    4. SDE(ORA) := Oracle enterprise geodatabase
  2. Insert Method:
    1. ArcMap Editor := edit session in ArcMap
    2. arcpy.InsertCursor := original/older ArcPy insert cursor
    3. arcpy.da.InsertCursor := ArcPy Data Access insert cursor
  3. Insert Geometry:
    1. NOTHING := no geometry is specified.
      1. In ArcMap edit session, table is populated with no geometry
      2. In ArcPy insert cursors, SHAPE field is not specified with cursor
    2. NONE := Python None object is passed to SHAPE field
    3. EMPTY := empty polygon is passed to SHAPE field
    4. POLYGON := non-empty polygon is created or passed to SHAPE field
  4. SHAPE Value:
    1. N/A := not applicable, i.e., not possible to insert or attempt to insert type of geometry or object
    2. Error := error is generated attempting to insert type of geometry or object
    3. Null := NULL value/marker in SHAPE field
    4. Empty := empty polygon or empty collection in SHAPE field
    5. Polygon := non-empty polygon in SHAPE field

Table 2 shows what gets returned by search cursors once the records from Table 1 have been inserted into a feature class.

Table 2:  Retrieved Geometry/Object From SHAPE Field

Storage

Type

Retrieve

Method

NULLEMPTYPOLYGON
PGDBarcpy.SearchCursorNoneEmptyPolygon
PGDBarcpy.da.SearchCursorNoneNonePolygon
FGDBarcpy.SearchCursorNoneEmptyPolygon
FGDBarcpy.da.SearchCursorNoneNonePolygon
SDE(SQL)arcpy.SearchCursorNoneEmptyPolygon
SDE(SQL)arcpy.da.SearchCursorNoneNonePolygon
SDE(ORA)arcpy.SearchCursorNoneEmptyPolygon
SDE(ORA)arcpy.da.SearchCursorNoneNonePolygon

NOTE:

  1. Storage Type:
    1. PGDB := personal geodatabase
    2. FGDB := file geodatabase
    3. SDE(SQL) := SQL Server enterprise geodatabase
    4. SDE(ORA) := Oracle enterprise geodatabase
  2. Retrieve Method:
    1. arcpy.SearchCursor := original/older ArcPy search cursor
    2. arcpy.da.SearchCursor := ArcPy Data Access search cursor
  3. SHAPE Value:
    1. NULL := NULL value/marker in SHAPE field
    2. EMPTY := empty polygon or empty collection in SHAPE field
    3. POLYGON := non-empty polygon in SHAPE field
  4. Retrieve Geometry:
    1. None := Python None object
    2. Empty := empty polygon or empty collection
    3. Polygon := non-empty polygon

It is clear there are some differences, inconsistencies one might say, with how different tools insert nothing or empty geometries into a feature class that allows NULL values.  Even with the same tool, there are situations where nothing or empty geometries are handled differently between different types of geodatabases.  There are also differences with how different search cursors, and likely update cursors, retrieve empty geometries from feature classes.

I can't say what it all means, but there are a few items that stuck out for me.

  • Non-empty geometries are handled consistently across tools and types of geodatabases.
  • "Allow NULL Values" for the SHAPE field doesn't mean missing geometries will be NULL, it means missing geometries may be NULL, they could also be empty.
  • The ArcPy Data Access search cursor returns None whether a SHAPE field is NULL or contains an empty geometry, so None doesn't necessarily mean NULL for geometries.

more
3 0 5,793
JoshuaBixby
MVP Esteemed Contributor

This is the second in a two-part series on the risks to software users of poor documentation; specifically, the confusion and unexpected results that come from weakly documented spatial operators in GIS software.  The first part in the series looks at how inconsistent and incomplete documentation requires users to guess how spatial operators are implemented.  The second part in the series looks at the inconsistent results that arise from mixing different implementations of spatial operators.

One of the many empty geometry bugs I have submitted recently got put on the front burner this week when I noticed Esri updated the status to "Closed:  Will Not be Addressed."  What has ensued is a discussion that is still unfolding over expected behavior versus unexpected results.  Coincidentally, the issue can be framed neatly within the discussion and examples from the first post in this series (What's Within:  When (Esri != Clementini) = ?).

The first post in this series discussed how there is no singular definition of spatial relations for geometry libraries and geospatial applications, and how the Dimensionally Extended 9 Intersection Model (DE-9IM) became the prevailing 2D definition after inclusion in the OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Co....  The post focused on how incomplete documentation of spatial operators forces users to guess, and likely incorrectly at times, how software works instead of knowing how the software should work.  As unfortunate for users as incomplete documentation can be, mixing different definitions of spatial relations within the same spatial operator is much worse, and that appears to be what Esri has done.

Borrowing from the examples in the previous post, let's look at three features and the ArcPy Geometry.within() method.

>>> #Create square polygon
>>> polygon = arcpy.FromWKT('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))')
>>> #Create line that completely lies on polygon boundary
>>> line = arcpy.FromWKT('LINESTRING(1 0, 2 0)')
>>> #Create empty line
>>> line_empty = arcpy.FromWKT('LINESTRING EMPTY')
>>>
>>> #Test whether line is within polygon
>>> line.within(polygon)
False
>>> #Test whether line_empty is within polygon
>>> line_empty.within(polygon)
True

For comparative purposes, let's run the same two tests from Lines 9 and 12 using Esri’s ST_Geometry in Oracle:

SQL> --Test whether line is within polygon
SQL> SELECT sde.st_within(sde.st_geomfromtext('LINESTRING(1 0, 2 0)', 0),
                          sde.st_geomfromtext('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))', 0))
       FROM dual;
SDE.ST_WITHIN(SDE.ST_GEOMFROMTEXT('LINESTRING(10,20)',0),SDE.ST_GEOMFROMTEXT('PO'
--------------------------------------------------------------------------------
                                                                               0
SQL> --Test whether line_empty is within polygon
SQL> SELECT sde.st_within(sde.st_geomfromtext('LINESTRING EMPTY', 0),
                          sde.st_geomfromtext('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))', 0))
       FROM dual;
SDE.ST_WITHIN(SDE.ST_GEOMFROMTEXT('LINESTRINGEMPTY',0),SDE.ST_GEOMFROMTEXT('POLY'
--------------------------------------------------------------------------------
                                                                               0

Interesting, the results from using ST_Geometry differ from using ArcPy.  Knowing that ST_Geometry functions are OGC simple feature access and SQL compliant, which means they implement DE-9IM, the results from ST_Geometry in Oracle are expected because a line solely on the boundary of a polygon is not considered within the polygon, and an empty geometry cannot be within another geometry.

When looking at ArcPy results, Line 10 is correct only if the ArcPy Geometry Classes implement Clementini's definition since we saw in the first post of this series that Esri's definition of Within is True in this situation.  Unfortunately, the ArcPy documentation doesn’t state one way or another which definition it is implementing.  The result on Line 13 is correct only if the ArcPy Geometry Classes implement Esri's definition because the Clementini definition does not allow for an empty geometry to be within another geometry.  Clear as mud, right?

Esri's stance, to date, is that everything is working as designed.  What?!  If that is the case, Esri is implicitly acknowledging they are implementing different definitions of a spatial relation within the same spatial operator, it just depends what geometries you pass it!!  Between the closed source code, incomplete documentation, and seemingly arbitrary implementation; ArcPy Geometry Classes are use at your own risk.  Caveat utilitor.

more
0 4 2,915
JoshuaBixby
MVP Esteemed Contributor

This is the first in a two-part series on the risks to software users of poor documentation; specifically, the confusion and unexpected results that come from weakly documented spatial operators in GIS software.  The first part in the series looks at how inconsistent and incomplete documentation requires users to guess how spatial operators are implemented.  The second part in the series looks at the inconsistent results that arise from mixing different implementations of spatial operators.

Nine months since my last blog post, I can't say that was the pace I was aiming for when GeoNet got stood up (at least I don't give more weight to GeoNet resolutions than I do New Year's ones).  I don't know if my drought is busted, but a raw nerve has been hit hard enough by Esri to make it rain, at least for the moment.  Of all the soapboxes that litter my closets, poor documentation and its consequences is one of the most tattered.  As much as Honest Abe says you can't trust everything you read on the internet, I do think software users should be able to trust a company's online help/documentation.

One cannot spend too much time in the world of spatial relations without coming across the Dimensionally Extended 9 Intersection Model (DE-9IM).  The DE-9IM was developed by Clementini and others in the mid-'90s as an evolution of the 4 Intersection Model (4IM) and 9 Intersection Model (9IM).  Although the DE-9IM isn't the only definition of spatial relationships, it became the prevailing 2D definition after inclusion in the OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Co....

References to and discussions of DE-9IM used to be found in various Esri documentation, but those references and documentation are becoming harder to find in current ArcGIS documentation.  For example, between the new 10.3 ArcGIS for Desktop, ArcGIS for Server, and ArcGIS for Developers sites, Clementini is only referenced on a couple handful of pages and DE-9IM is only referenced and discussed on one page:  Relational functions for ST_Geometry.  Whereas Python has a we're-all-grown-ups philosophy, Esri seems to be going the other direction and feeding us Pablum, without the vitamin fortification.

Although DE-9IM is the basis for 2D spatial predicates/relations in many geometry libraries and geospatial applications, there are two overlay types for the Select Layer By Location tool where Esri's default implementation differs from Clementini:  Contains, Within.  In both cases, the default or unqualified overlay type implies Esri's definition (blue underline) while the Clementini definition (red underline) is handled through qualifiers.

arcmap_103_select_layer_by_location_clementini.PNG

So how does the Esri definition of Contains and Within differ from the Clementini definition?  Forgoing mathematical notation and illustration matrices, the difference boils down to how boundaries of geometries are handled.  For example, geometry a that is entirely on the boundary of geometry b is considered within geometry b using Esri's definition but not within geometry b using Clementini's definition.  Let's create simple polygon and line features to demonstrate:

>>> polygon = arcpy.FromWKT('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))')
>>> line = arcpy.FromWKT('LINESTRING(1 0, 2 0)')
>>> arcpy.CopyFeatures_management(polygon, 'in_memory/polygon')
<Result 'in_memory\\polygon'>
>>> arcpy.CopyFeatures_management(line, 'in_memory/line')
<Result 'in_memory\\line'>

arcmap_simple_polygon_line_labeled.PNG

Executing the Select Layer By Location tool using both definitions of Within:

>>> arcpy.SelectLayerByLocation_management("line", "WITHIN", "polygon")
<Result 'line'>
>>> arcpy.GetCount_management("line")
<Result '1'>
>>> arcpy.SelectLayerByLocation_management("line", "WITHIN_CLEMENTINI", "polygon")
<Result 'line'>
>>> arcpy.GetCount_management("line")
<Result '0'>

So far, so good.  Beyond the fact that Esri's default definitions of Contains and Within differ from most other geometry libraries and geospatial applications, including the OGC simple feature standards, at least the results match the sparse documentation available online.

At this point, it is really important to point out something that is easily overlooked.  Esri's ST_Geometry functions are compliant with the OGC simple feature access and SQL standards, which means ST_Within adheres to the Clementini definition and not the Esri definition.

SQL> SELECT sde.st_within(sde.st_geomfromtext('LINESTRING(1 0, 2 0)', 0),
  2                       sde.st_geomfromtext('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))', 0))
  3    FROM dual;
SDE.ST_WITHIN(SDE.ST_GEOMFROMTEXT('LINESTRING(10,20)',0),SDE.ST_GEOMFROMTEXT('PO
--------------------------------------------------------------------------------
                                                                               0

Up until this point, I would argue the Esri documentation pertaining to spatial relations has been weak because it relies heavily on inference.  The attentive user might notice there are multiple Within overlay types in the drop-down box for the Select Layer By Location tool, and the inquisitive user might go one step further to read about overlay types to understand the differences between them.  The really attentive and knowledgeable user might understand that a single line in the What is the ST_Geometry storage type? documentation stating ST_Geometry implements the SQL 3 specification means the ST_Within function adheres to Clementini's definition instead of Esri's from the Select Layer By Location tool.  In short, there is no documentation that explicitly acknowledges there are different definitions of certain spatial relations within various parts of Esri's own software.

Confused yet?  Just wait, the fun really starts when we dive into the ArcPy Geometry Classes because the documentation goes from weak to really weak.  The only references to OGC in the ArcPy Geometry Classes documentation are for the WKB and WKT properties, and there are no references to DE-9IM or Clementini.  Let's take a look at the documentation for the within method:

arcgis_103_geometry_class_within_documentation.PNG

So, the geometry is within another geometry if it is within that geometry.  Got it.  Oh wait, are they asking me whether a geometry is within another geometry?  Although none of the illustrations captures a situation that differentiates the Esri and Clementini definitions, the lack of any reference to OGC, DE-9IM, or Clementini makes one think it is Esri's definition being used.  Let's check:

>>> line.within(polygon)
False

Ouch, that smarts.  It is pretty clear there is a lack of consistency with how certain spatial operators are implemented in various parts of Esri's software, but the worst part is the documentation doesn't even point it out.  Users are left to infer, and likely incorrectly at times, how the software works instead of being informed about how it works.  Caveat utilitor.

more
5 3 9,638
JoshuaBixby
MVP Esteemed Contributor

Recently, a program specialist approached me with questions about the file modified dates in ArcCatalog.  I started by explaining the Modified column in ArcCatalog shows when the data or schema was last changed in a shapefile or feature class; well, a feature class in a file geodatabase at least.  The user wasn't buying it, so I set out to show him what I meant.

I can't recall if Modified has been an option in the Contents window all along or if it was added sometime along the way, but I do know it isn't turned on by default.  Since I don't use it much, I had to go enable it:

arccatalog_102_contents_tab_modified.PNG

I don't like demoing on real data for several reasons, including the fact the data can sometimes be the problem, so I whipped up an empty shapefile and feature class for testing.  As you can see, the shapefile was created and last modified the other day.  Let's use ArcPy and an InsertCursor to put a record into the shapefile and check the Modified date.

arccatalog_arcpy_insert_record_shapefile.PNG

After refreshing the Contents window, we can see the Modified date gets updated after the edit session is ended using the stopEditing command.  That seems pretty reasonable, i.e., the changes are committed after the editing is complete and the Modified column is updated to reflect the time the edit session ended.  Let's do the same thing with a feature class.

arccatalog_arcpy_insert_record_featureclass.PNG

That's odd.  After ending the edit session and refreshing the Contents window, ArcCatalog is still showing the feature class as being modified when it was created a few days ago.  Maybe it is my code or something is messed up with this ArcCatalog session.  I am going to close out of ArcCatalog and start over.

arccatalog_arcpy_insert_record_featureclass_2.PNG

This is even more odd.  The Modified column now shows 4 minutes before the edit session ended.  In fact, the Modified column shows a time that is prior to starting my edit session (the timestamps for starting the session aren't given above, but the edit session was started less than a minute before I ended it).

If the Modified column isn't showing the start or end of an edit session, what is it showing with feature classes?  The answer, when ArcCatalog or ArcMap is closed.  I have checked in ArcMap with manual edit sessions, and the behavior is the same.  Yes, the Modified column for feature classes doesn't show when data is updated like with shapefiles but when the application is closed.  In our organization with casual GIS users and disconnected Citrix sessions, the discrepancy can be days.

more
0 5 3,067
JoshuaBixby
MVP Esteemed Contributor

Anyone who has dabbled in ArcPy is likely familiar with the ArcPy and Tool reference sections of the ArcGIS Help.  After all, those sections are where the functionality of ArcPy classes and ArcGIS Tools are documented, including descriptions, syntax, and examples.  As much as there is plenty of consistency between the look and feel of those two sections, there is an important inconsistency in the Syntax tables of those two sections.  Although the inconsistency isn't enough to trip up someone who regularly writes code, I regularly see it cause confusion among those who are just beginning to script with ArcPy and ArcGIS Tools.

Let's get a couple examples on deck.  I will start with a Syntax screenshot from the ListLayers function of the arcpy.mapping module, and

arcpy_102_mapping_list_layers_syntax.PNG

follow it up with a partial screenshot of the Syntax for the Dissolve tool in ArcGIS Desktop.

arcgis_102_dissolve_syntax.PNG

At first glance, it is easy to notice the consistency between the look and feel of the Syntax tables, e.g., both tables have the same formatting style and column headers.  There is value in consistency, especially in documentation, but consistency in style doesn't always equate to consistency in content, and this is where I see beginner scripters stumble when reading through Esri documentation.  The Syntax tables in the  ArcPy and Tool reference sections of the ArcGIS Help share the same column headings, but the content of the Data Type column differs between sections.

I posted the Syntax screenshot from the ListLayers function first because 'Data Type' in the ArcPy section is consistent with what a vast majority of people think when discussing programming/scripting and data types.  For example, the data type for the map_document_or_layer parameter is listed as Object, and the explanation column states it needs to be a variable with a reference to a MapDocument  or Layer object.  The wildcard parameter is listed as being a String, and the data_frame parameter is listed as being an arcpy.mapping DataFrame object.

It is interesting to note the data_frame parameter has a specific data type given while the map_document_or_layer parameter has a generic Object data type given.  My guess is that since the former parameter accepts a single object type while the latter accepts two different object types, someone made a judgment call to go with the more generic Object data type instead of listing all of the applicable object types in the Data Type column.  Fair enough.

As mentioned above, the formatting style for Syntax is identical between the ArcPy and Tool reference sections of the ArcGIS Help, right down to the column headings.  Whereas 'Data Type' in the ArcPy section is consistent with general programming usage, 'Data Type' in the Tool reference section means something a bit different, a bit muddled in my opinion.  For users just starting out with Python and ArcPy scripting, looking at the Dissolve Syntax table might lead them to believe the in_features parameter accepts a Feature Layer object and the out_feature_class parameter accepts a Feature Class object.  Unfortunately, they would be wrong, or wrong enough to be confused.  Let's see if some sample code gives any clarification.

arcgis_102_dissolve_sample_1.PNG

That's interesting, every parameter in the sample code is a string or list of strings.  If the out_feature_class parameter has a data type of Feature Class, why would the sample code be passing it a string?   Are we missing something?  Maybe the Understanding tool syntax help page has some answers.  Looking at Data Type:

arcgis_102_understanding_tool_syntax_syntax.PNG

The first paragraph makes sense, i.e., there are simple data types like strings and integers and more complex data types like arcpy objects.  The second paragraph is where things get interesting:  "Tool parameters are usually defined using simple text strings."  Huh, so parameters have data types but all data types are 'usually defined using simple text strings.'  A string is a string but so is a Feature Class.  Interesting.  If one follows the data type link visible in the screenshot above, a bit more explanation can be found in the Data types for geoprocessing tool parameters page.

arcgis_102_data_types_for_geoprocessing_tool_parameters.PNG

As best I can tell, the Syntax tables in the Tool reference section give a Data Type, like in the ArcPy section, but it isn't really a data type the way most people would think of a data type when programming Python.  Just like a picture of a table is different than the table itself, a string representation of an object isn't the same as the object and their data types aren't one of the same either.  What comes my mind is the difference in databases between data types and data domains.  A column containing an 'M' or 'F' for gender still has a data type of string, even if the string represents the gender of an individual.

I don't see an issue with using string representations of objects, after all there is a lot more overhead with passing object or object references than strings, but don't overload the meaning of a commonly understood term just so column headings can be the same between two different sections in the manual.  Consistency has value but it shouldn't come at the expense of correctness.

more
1 0 2,369
JoshuaBixby
MVP Esteemed Contributor

This is the fourth in a four-part series on the challenges of naming new features in software applications; particularly, the consequences when naming falls short.  The first part in the series looks at a case when the name of a new feature clearly and succinctly describes the behavior of that feature.  The second part in the series looks at that same case when newer yet functionality alters the original behavior of that new functionality.  The third part in the series looks at how the documentation has changed to addresses this altered functionality.  And finally, the fourth part in the series discusses what it all means to end users and developers.

When someone has been using a software application for a long time, say ArcGIS Desktop for 10 or more years, it isn't completely uncommon for a user to get set in his/her ways, maybe even a bit complacent.  Not only does this happen with the use of software, it can also happen with reading software documentation.  After all, if you have been using the software for more than 10 years, of course you know what the documentation says and exactly how features work, right?

It is around this time that RTM, STW, or maybe GIYF comments can start showing up in response to one's questions in forums, listservs, etc....  (I know, forums and listservs are so Web 1.0, but they are still workhorses for many GIS practitioners).  But what if you have read the manual, searched the web, and gave Google or other search engines the old college try.  Well, sometimes the real answer is WABM, and I think that is what we have here when it comes to in-memory workspaces and background processing in ArcGIS.

In looking over the first three parts in this series, I can't help think of the latest Errol Morris documentary, or at least the title of it:  The Unknown Known.  In many ways, I feel like the in-memory workspace and its documentation represents an unknown known.  Giving Esri the benefit of the doubt and assuming there is at least one developer or group of developers that truly understands how in-memory workspaces are supposed to work, we basically have a situation where the documentation has completely failed to communicate the information.  The in-memory workspace information is known within the cloistered walls of Redlands but it is unknown to people actually using and developing with the software.  From the end user perspective, it is an unknown known, or maybe an unknown unknown for some.

The unknown knowns don't just end with in-memory workspaces.  For anyone who has worked with Esri software, especially Esri Support, he/she knows only a fraction of the bugs submitted get publically published in ArcGIS Resources.  For example, there are 4 open bugs relating to in-memory workspace linked to my organization's customer number and yet none of them is findable in ArcGIS Resources.  It is one thing for Esri Development to have their own bug tracking system and that information not be publically published, but not publishing known bugs from the Esri Support bug tracking system creates lots of unknown knowns, i.e., Esri knows there is an issue with the software but that isn't being shared with users.

So what does this all mean or what is the importance?  Wasted time, reduced productivity, lack of confidence in the software, and more....  The cost of poorly documented information is borne by the end user, and unfortunately that includes me.  When the choice of GIS software is a personal one, the end user has the choice to explore and possibly choose to use different GIS software; but when the choice of GIS software is made for someone by an organization, the end user just gets to eat the lost time, productivity, and frustration of working with software that either isn't documented well or doesn't work correctly.  Unknown knowns undermine the potential of software and can turn new functionality into little more than marketing hype.

more
2 4 2,301
JoshuaBixby
MVP Esteemed Contributor

This is the third in a four-part series on the challenges of naming new features in software applications; particularly, the consequences when naming falls short.  The first part in the series looks at a case when the name of a new feature clearly and succinctly describes the behavior of that feature.  The second part in the series looks at that same case when newer yet functionality alters the original behavior of that new functionality.  The third part in the series looks at how the documentation has changed to addresses this altered functionality.  And finally, the fourth part in the series discusses what it all means to end users and developers.

The first part in this series (What's in a Name:  When in_memory = In-memory) looked at the introduction of the in-memory workspace and ginned up a few basic examples to check it out.  Basically, it worked and the new in_memory feature meant in-memory.  The second part in this series (What's in a Name:  When in_memory != In-memory) looks at those same examples and sees how they turn out after the introduction of ArcPy and Background Processing.  Honestly, it is hard to say how those examples turned out.  The polite way to say it might be "mixed results."  Although there were cases where in_memory looked to be in-memory, there were also cases where in_memory looked to be on-disk.  Even when in_memory seemed to be in-memory, there were some odd behaviors with some of the tools/functions.

To get a better idea of what might be going on, I need to look at the supporting documentation a bit, and the online manual is as good a place as any to start.  Since the behaviors we saw in the second part of this series start with ArcGIS 10.0 and persist through ArcGIS 10.2.2, I will just jump to the ArcGIS 10.2.2 manual with the assumption the latest and greatest regarding in-memory workspaces and background processing should be documented there.  Visiting ArcGIS Resources gets me in one click to the Help for the latest version of ArcGIS.  A bit of poking around leads me to find the persistent URL for the ArcGIS 10.2/10.2.1/10.2.2 Help.  Searching on 'in_memory' gets a link to ArcGIS Help 10.2 - Using in-memory workspace, which seems like a good place to start.  The page is too long and has too many things to say for screenshots, but I will paste a few important excerpts below.

  • ArcGIS provides an in-memory workspace where output feature classes and tables can be written. Writing geoprocessing output to the in-memory workspace is an alternative to writing output to a location on disk or a network location. Writing data to the in-memory workspace is often significantly faster than writing to other formats such as a shapefile or geodatabase feature class. However, data written to the in-memory workspace is temporary and will be deleted when the application is closed.

    To write to the in-memory workspace, use the path in_memory, as illustrated below.
  • When data is written to the in-memory workspace, the computer's physical memory (RAM) is consumed.

  • The Delete tool can be used to delete data in the in-memory workspace. Individual tables or feature classes can be deleted, or the entire workspace can be deleted to clear all the workspace contents.
  • A table, feature class, or a raster written to the in-memory workspace will have the source location of GPInMemoryWorkspace.
  • You can use the in_memory workspace in Python as well,

The manual clearly states that in-memory workspaces are just that, in your computer's physical memory, and that you access the workspace using in_memory.  It also states the in_memory path is supported in tools and Python.  Additionally, it states that in-memory workspaces have a source location of GPInMemoryWorkspace.  Finally, it states the Delete tool can be used to remove individual tables or feature classes from the in-memory workspace.

Everything covered in the ArcGIS Help 10.2 - Using in-memory workspace page makes sense and agrees with itself, until you actually try to apply it in ArcGIS Desktop 10.x!  I think the Help is correct when it states in-memory workspaces have source locations of GPInMemoryWorkspace and that those locations are stored in the computer's physical RAM.  Beyond that, I am not so sure because we saw examples where in_memory can lead to on-disk, not always in-memory.  We also saw a case where the Delete tool failed to delete an in-memory table, ostensibly because it couldn't see it in the first place to delete it.  Even stranger, the Delete tool also successfully deleted nothing when the in_memory table was created on-disk instead of in-memory.

The examples in the second part of this series give the impression that Background Processing affects how the in_memory path works.  Surprisingly, Background Processing isn't even mentioned once on the Help page for in-memory workspaces.  Maybe the effects of Background Processing on the in_memory path are documented in the Help pages for Background Processing.  Searching on 'background processing' in the main search bar brings up ArcGIS Help 10.2 - Foreground and background processing, which seems like a good place to go next.  Similar to the in-memory workspace help, this help page is too long and has too much to say for screenshots.  Looking at a couple excerpts:

  • The Background processing panel is where you control whether a tool executes in foreground or background mode.
    If Enable is checked, tools execute in the background, and you can continue working with ArcMap (or other ArcGIS applications, such as ArcGlobe) while the tool executes.
  • Background processing can be thought of as another ArcMap session running on your computer but without the ArcMap window open.

There is lots more information on the page than what I provide above, but none of it has to do with in-memory workspaces.  In fact, 'in-memory' and 'in_memory' aren't even referenced once throughout all of the documentation.  The ArcGIS Help 10.2 - Background Geoprocessing (64-bit) is the same, i.e., neither of the terms is mentioned once.  Given the second part in this series clearly shows Background processing affects the functionality of using in_memory with Python code and ArcGIS tools, it does seem odd that neither of the two main pages on Background processing even mention the term.

If the main or introductory help pages for in-memory workspaces and background processing don't address what we are seeing, maybe the information is buried in a help page on a related topic.  Looking at the Managing intermediate (scratch) data in shared model and tools page, "you can also write intermediate data to the in-memory workspace."  That said, no reference to background processing at all.  A quick tour of managing intermediate data is the same, i.e., speaks to using in-memory workspaces but doesn't mention anything about Background Processing.  Searching on Background processing instead of in-memory or in_memory yields similar results about speaking to one and not the other.  Interestingly, the Guidelines for arcpy.mapping (arcpy.mapping) page has a statement:

  • To use the CURRENT keyword within a script tool, background processing must be disabled. Background processing runs all scripts as though they were being run as stand-alone scripts outside an ArcGIS application, and for this reason, CURRENT will not work with background processing enabled.

Although this doesn't directly mention in-memory workspaces, it does hint that Background Processing may or does alter how certain code works in ArcGIS Desktop.  Tenuous, I know, but there really isn't much else that I can find in the manual.

Maybe the documentation is complete and there is just a bug that is driving all of the discrepancies we saw in the second part in this series.  Unfortunately, searching the published bugs for 'in_memory' and 'in-memory' doesn't yield much, 4 hits, and definitely nothing to explain what we have seen.

Let's head to the forums to see if someone has posed this question before.  Interestingly enough, someone has asked basically the same question, and more than 2 years ago:  It appears "in_memory" is not really in memory.  There are/were basically two responses in the forum thread, and neither of them appear to be from Esri staff directly.

geonet_in-memory-forum_responses.PNG

The first response is a bit incomplete because it doesn't really say whether the statement applies to foreground or background processing, or both.  Since the original poster didn't say whether or not Background Processing was enabled, I am going to assume that defaults settings are being used, which means background processing.  I did a quick check using the Mosaic to New Raster tool with Background Processing turned on and turned off.  With Background Processing turned off, the in_memory raster consumed roughly 400 MB of RAM.  With Background Processing turned on, the in_memory raster consumed about 120 MB.  There may be some memory mapping occurring when Background Processing is enabled, but it surely isn't loading everything to RAM and just keeping a reference on disk.

The second response makes sense, but it isn't completely accurate because we can find tools where in_memory stills means in-memory even when Background Processing is enabled.  CreateFeatureclass might work the way the reply states, but CopyFeatures surely doesn't.  So, how do we know which tools work which way?

Not only did in-memory workspaces change at ArcGIS 10.0, it doesn't seem Esri's online documentation really addresses any of the changes in behavior.  It is time to take a step back and think about what all of this means to end users and developers trying to use the software.

more
1 1 2,652
JoshuaBixby
MVP Esteemed Contributor

This is the second in a four-part series on the challenges of naming new features in software applications; particularly, the consequences when naming falls short.  The first part in the series looks at a case when the name of a new feature clearly and succinctly describes the behavior of that feature.  The second part in the series looks at that same case when newer yet functionality alters the original behavior of that new functionality.  The third part in the series looks at how the documentation has changed to addresses this altered functionality.  And finally, the fourth part in the series discusses what it all means to end users and developers.

When I first started beta testing ArcGIS 9.4, it didn't take long for me to see this was going to be a big release for Esri.  It turns out, it was big enough to get promoted from a minor release to a major one during beta, and we all ended up with ArcGIS 10.0.  The What's New in ArcGIS 10 covers lots of ground, there is just about something for everyone in there.  No matter how narrow or limited your use of ArcGIS Desktop may be, one change you couldn't miss was the user interface, which had remained quite constant through the ArcGIS 8.x and 9.x days.

I was interested in lots of the changes with ArcGIS 10.0, so many I shouldn't even bother starting to list them here.  Although lots of changes got my attention, the changes to geoprocessing really stood out:  background processing was introduced, the Python window replaced the Command Line window, ArcPy took Python support to the next level, and more.  Combining all of these new features with one of my favorite existing features, the in-memory workspace, I was actually a bit excited to kick the tires and see just how great this next ride might be.

Unlike ArcGIS 9.2 where I had to use the Wayback Data Center, ArcGIS 10.0 is still in production around parts of my agency, which makes it easy to take a step back in time and still generate new screenshots.

arcmap_10_sp5_build_number.PNG

For the sake of consistency and simplicity, I will just re-use the examples from the first post in this series (What's in a Name:  When in_memory = In-memory) to get acquainted with the Python window in ArcGIS 10.0.  Let's take a look at the results of creating a table in the in-memory workspace:

arcmap_10_toc_inmemory_table.PNG

Success, or not?  The command appears to have completed successfully, but tmpTable doesn't appear to be in the GPInMemoryWorkspace.    I am going to run that command again.

arcmap_10_toc_inmemory_table_2.PNG

Huh.  The command completed successfully; but again, the tmpTable doesn't appear to be in the GPInMemoryWorkspace.  In fact, now I have two tmpTables, and each one seems to have its own cryptic geodatabase in my Temp folder.  Unlike in ArcGIS 9.2 where the command failed because tmpTable already existed, ArcGIS 10.0 does you a favor, if you can call it that, by just creating another one in another cryptic geodatabase.

I don't know what is going on here.  I better just delete these tables and clean up this mess.

arcmap_10_toc_delete_table.PNG

Wait, I can't delete the tmpTable using the same syntax that worked in ArcGIS 9.2?  I guess if the tables aren't really being created in-memory, then it makes sense the Delete_management function won't find them there.  The autocomplete in the Python window wants to delete "tmpTable," without the reference to "in_memory."  I will give that a try:

arcmap_10_toc_delete_table_2.PNG

Well, at least that worked, but I don't know which tmpTable the autocomplete was talking about.  Fortunately, running the command again did clean up the other tmpTable.

Creating feature classes in-memory behaves the same way.  Also, the corresponding tools in Toolbox for creating tables and feature classes demonstrate the same behavior.  There is definitely enough consistency here to not just be a bug in a specific tool/function.  Who knows, maybe in_memory means on-disk in ArcGIS 10.0.

The first part in this series had an example that actually moved some data into an in-memory workspace.  It can't hurt to repeat that here before coming to any conclusions.  First, load those U.S. State boundaries again.

arcmap_10_toc_inmemory_copyfeatures.PNG

Well, there we are again, a copy of features loaded into an in-memory workspace.  What?  GPInMemoryWorkspace?  I can't say whether I expected this result or not.  So, does in_memory mean in-memory or on-disk?  Obviously something changed between ArcGIS 9.2 and ArcGIS 10.0, but what?

The short answer, Background Processing.

arcmap_10_geoprocessing_options.PNG

Not only was Background Processing introduced in ArcGIS 10.0, it was turned on by default.  I can't recall the reason today, but at some point years ago I had a need to disable Background Processing.  At that point, I realized disabling, or not enabling, Background Processing almost reverts in_memory back to how it behaved in ArcGIS 9.2 and 9.3/9.3.1.

Interestingly enough, all of the examples above turn out very similar in ArcGIS 10.2.2.  I would argue the situation in ArcGIS 10.2.2 is slightly worse than back in ArcGIS 10.0.  For example, running the CreateTable_management function twice and then attempting to delete tmpTable using a fully specified in_memory path gives us:

arcmap_1022_toc_delete_table.PNG

In ArcGIS 10.0, the Delete_management function failed because tmpTable didn't actually exist in-memory, which seems logical.  In ArcGIS 10.2.2, the Delete_management function succeeds, but at deleting nothing!  Granted, it did return a warning that tmpTable doesn't exist in-memory, but then it continues on in deleting nothing and returning a successful result.  I can't speak for others, but if I call a function to delete an object and that object doesn't exist, I usually expect an error to be returned.

Better yet, see what happens in ArcGIS 10.2.2 when you disable Background Processing, create a table in-memory, and try to use a fully specified in_memory path to delete it:

arcmap_1022_toc_delete_table_2.PNG

You can ostensibly successfully delete the table three times and yet it still exists!  And, this is after it has warned you it doesn't exist when it clearly does, and in-memory.

It is obvious that things changed at ArcGIS 10.0 with the in-memory workspace, particularly with the use of 'in_memory.'  I don't know what all changed, but there is a connection with Background processing.  Furthermore, the changes have persisted throughout the ArcGIS 10.x product series.  I think it is time for me to RT(?)M and see what the documentation has to say about all of these changes.

more
0 1 2,586
JoshuaBixby
MVP Esteemed Contributor

This is the first in a four-part series on the challenges of naming new features in software applications; particularly, the consequences when naming falls short.  The first part in the series looks at a case when the name of a new feature clearly and succinctly describes the behavior of that feature.  The second part in the series looks at that same case when newer yet functionality alters the original behavior of that new functionality.  The third part in the series looks at how the documentation has changed to addresses this altered functionality.  And finally, the fourth part in the series discusses what it all means to end users and developers.

When deciding what to call a new feature in a software application, relatively short and relatively descriptive usually win out.  It makes sense, really, who wants to bust out the Help or a super-decoder ring just to get an idea of what a feature might or might not do.  There are risks, however, with trying to be too short or too descriptive.  The former often leads to important qualifiers or fine print being left out, and putting the former and latter together typically lulls users into a false sense of understanding, i.e., assuming what the feature does instead of knowing.  If the act of naming a new feature doesn't pose enough of a challenge, staying true to the name over time poses an even bigger challenge.

So why bring up the challenge of naming new features and staying true to those names over time?  Well, because the challenge of staying true to a name has proven too much for at least one feature in ArcGIS, and the handling of the situation has become a failure in and of itself, in my opinion.

Back around the time Borat was touring the country learning about American culture, Esri released ArcGIS 9.2 (ArcGIS for Desktop Product Life Cycle Support Status).  Its too bad he didn't swing by the Institute when passing through the Orange Empire, that would have been worth the ticket price alone.  One of the new features introduced in ArcGIS 9.2 was the "in-memory workspace for writing temporary feature classes and tables," which could "greatly improve the performance of models, especially when writing intermediate (scratch) data" (What's New in ArcGIS 9.2).  Needless to say, I was interested.

Although I don't have screenshots from that time, fortunately my agency's Wayback Data Center still has ArcGIS 9.2 installed, build 1324 nonetheless!  Let's role the clock back and see the in-memory workspace at its beginnings.

arcmap_92_sp0_build_number.PNG

After launching ArcMap, I was momentarily thrown by the Command Line.  The Python window didn't replace the Command Line until ArcGIS 9.4, aka ArcGIS 10.0 (What's New in ArcGIS 9.4 - no link, don't think I can post a copy of the PDF either).  After taking a few minutes to reacquaint myself with the Command Line, it was time to get down to business.  Since this post is about the naming of features and not their performance, we won't need many examples to see whether the new in_memory workspace is really in-memory.

One of the simplest examples I can think of is to create a new table in-memory:

arcmap_92_command_line_create_table_success.PNG

So, let's take a look at the Source tab in the Table of Contents:

arcmap_92_toc_inmemory_table.PNG

There it is, a new table in the GPInMemoryWorkspace.  What about creating the same table again:

arcmap_92_command_line_create_table_fail.PNG

So far, so good.  We expect an error given that the table already exists.  Let's take a look at the Table of Contents after I try deleting the in-memory table:

arcmap_92_toc_delete_table.PNG

Still going well.  The Delete command works and the in-memory table is gone.

Although I won't clutter up the post with more screenshots, I will say creating in-memory feature classes turned out the same way tables did above.  Also, creating in-memory feature classes and tables using ArcToolbox yielded the same results as with the Command Line.

Looking for an example that actually involves some data, I loaded a feature class containing the U.S. State boundaries into ArcMap.  A simple Copy Features command using in_memory should do the trick if in-memory workspaces are working as advertised.

arcmap_92_toc_inmemory_copyfeatures.PNG

Well, there we are, a copy of the features loaded into an in-memory workspace.

The basic examples above are far from a definitive test, but they do show that starting with ArcGIS 9.2 users have the ability to store intermediate data in-memory while working in ArcMap.  Overall, I would have to say the marketroids were right on this one.  The in_memory workspace really is in-memory, at least within the scope of its design.

When it comes to the challenge of naming a new feature, I think Esri can claim success with 'in_memory.'  The name is short, descriptive, and most importantly, accurate.  The question or challenge now becomes whether 'in_memory' can remain true to its original functionality as even newer features are introduced with subsequent versions of ArcGIS Desktop.

more
0 3 2,733
121 Subscribers