Is there a bug with sorting in_memory tables using arcpy.Sort_management?

635
6
03-12-2018 05:45 PM
SavenRoybal1
New Contributor II

Hi all,

I'm using Arc Desktop 10.5 - Standard, and I've been running into an issue with a simple script that only exports a domain to an in_memory table, then outputs the domain as another in_memory table sorted on the code field. The problem is, the record order of the "sorted" dataset comes back seemingly randomized whenever I feed arcpy.Sort_management an in_memory input table. 

For instance, this will return a random order result:

import arcpy

def notWorking( inputWorkSpace, domain):
     rawDomain = arcpy.DomainToTable_management(inputWorkSpace, domain, "in_memory/domain", "code", "description")
     sortedDomain = arcpy.Sort_management( rawDomain, "in_memory/sorted", [["code","ASCENDING"]] )

     with arcpy.da.SearchCursor( sortedDomain, "*" ) as cursor:
          for row in cursor:
               print("{} {}".format(row[0], row[1]))‍‍‍‍‍‍‍‍‍

"""
Output on domain originally defined as:

C: Charlie
E: Echo
A: Alpha
D: Delta
B: Bravo


>>>
1 B
2 C
3 A
4 E
5 D
"""‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Meanwhile, if I store the Sort_management's input in a gdb, it works just fine:

import arcpy

def works( inputWorkSpace, domain, temporaryWorkSpace):
     rawDomain = arcpy.DomainToTable_management(inputWorkSpace, domain, temporaryWorkSpace + "\\domain", "code", "description")
     sortedDomain = arcpy.Sort_management( rawDomain, "in_memory/sorted", [["code","ASCENDING"]] )

     with arcpy.da.SearchCursor( sortedDomain, "*" ) as cursor:
          for row in cursor:
               print("{} {}".format(row[0], row[1]))


"""
Output on domain originally defined as:

C: Charlie
E: Echo
A: Alpha
D: Delta
B: Bravo


>>>
1 A
2 B
3 C
4 D
5 E
"""‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I've attached a sample data set with a domain called "testDomain", but I've experienced this with every source (gdb/sde db) and domain I've tried.

While I have a work-around to get what I need done, I was wondering if there was a bug going on or if I'm missing something basic. Thanks in advance for any insight.

0 Kudos
6 Replies
DanPatterson_Retired
MVP Esteemed Contributor

hmmm sample.gdb is totally empty in 10.6

0 Kudos
SavenRoybal1
New Contributor II

Hi Dan, 

Are you not seeing a feature class called "dummyDataset"?

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Saven, When I loaded it last night, the gdb was empty... don't know if it was a 10.6 thing, or it was actually empty

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

An ArcPy Data Access Search Cursor represents a result set returned by querying the underlying data store.  Whether SQL or noSQL data store, result set ordering is typically not guaranteed unless an ORDER BY clause or operator is specified when querying the data store.  This is documented in some way, shape, or form by all the data store vendors:

  • SELECT - ORDER BY Clause (Transact-SQL)
    • Sorts data returned by a query in SQL Server. Use this clause to:

      • Order the result set of a query by the specified column list and, optionally, limit the rows returned to a specified range. The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.

  • Database SQL Language Reference - ORDER BY Clause (Oracle)
    • Use the ORDER BY clause to order rows returned by the statement. Without an order_by_clause, no guarantee exists that the same query executed more than once will retrieve rows in the same order.
  • cursor.sort() — MongoDB Manual 3.6 
    • Result Ordering

      Unless you specify the sort() method or use the $near operator, MongoDB does not guarantee the order of query results.

Unfortunately, Esri's documentation of file geodatabase behavior is still weak in areas, even after all these years.  Since it is common for data in file geodatabases to be stored sorted by ObjectID, the results of an unsorted query tend to look ordered by ObjectID.  Is ordering by ObjectID guaranteed without an ORDER BY clause?  I always err on the side of "no" if I am in a situation where the ordering of the result set truly matters.

The ArcGIS Sort tool is really about ordering the storage of records, not guaranteeing a result set from the data will be ordered.  One could ask, "if result set ordering isn't guaranteed using the Sort tool, why bother sorting the storage of records?"  It is a fair question, and the answer lies in efficiency of managing and query data.  The real strength of the ArcGIS Sort tool is its ability to sort data sets spatially, which can greatly improve the performance of spatial operations against data sets:  How Sort (management) works—Help | ArcGIS Desktop :

The power of the Sort tool lies in its ability to sort the features spatially. Once sorted spatially, efficiency of spatial or geometric operations is enhanced.

SavenRoybal1
New Contributor II

Hi Joshua,

Thanks for the info.

However, the sort tool is supposed to guarantee a sort on the designated field(s).

My question was regarding if there is a bug related to using memory stored datasets with the sort tool.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Looking at your code, how do you know the data isn't sorted correctly in-memory and the search cursor is just returning the results in different order?