Ordering fields within arcpy cursor to iterate groups and delete extras not working properly

MollyMoore · ‎06-07-2017

I am attempting to order based on two fields within an arcpy cursor so that I can then eventually delete the rows I do not want. Below is an example of the table I am working with:

Within each grouping of "El_Season" that is the same (in this case OIDs 1&2 and OIDs 3&4) I would like to sort based on "COUNT_OccProb" and by "prob_code" in descending order within a cursor so that once sorted, I can delete the extras within groups that have lower values. Currently, the code does not sort successfully and is just printing OIDs 2 and 4 as the extras where it should be printing OIDs 1 and 3 if working as I would like. This is the code that I am working with:

case_fields = ["El_Season"]
max_field = ["COUNT_OccProb", "prob_code"]
sql_orderby = "ORDER BY {}, {} DESC".format(case_fields, ",".join(max_field))

with arcpy.da.UpdateCursor(statistics, "*", sql_clause=(None, sql_orderby)) as cursor:
    case_func = itemgetter(*(cursor.fields.index(fld) for fld in case_fields))
    for key, group in groupby(cursor, case_func):
          next(group)
          for extra in group:
            print extra # currently have it printing extra instead of deleting for testing purposes

JoshuaBixby · ‎06-08-2017

From Using in-memory workspace—Help | ArcGIS Desktop (see orange text):

The following considerations must be made in deciding to write output to the in-memory workspace:
Caution:
Data written to the in-memory workspace is temporary and will be deleted when the application is closed.
Tables, feature classes, and rasters can be written to the in-memory workspace.
The in-memory workspace does not support extended geodatabase elements such as subtypes, domains, representations, topologies, geometric networks, and network datasets.
Feature datasets or folders cannot be created in the in-memory workspace.

From SearchCursor—Help | ArcGIS Desktop :

An SQL postfix clause is positioned in the second position and will be appended to the SELECT statement, following the where clause. The SQL postfix clause is most commonly used for clauses such as ORDER BY.
Note:
DISTINCT, ORDER BY, and ALL are only supported when working with databases. They are not supported by other data sources (such as dBASE or INFO tables).
TOP is only supported by SQL Server and MS Access databases.

Although it isn't explicitly stated, SQL postfix clauses aren't really supported with in-memory workspaces. Although tables, feature classes, and other data types can be stored in-memory, the in-memory workspace is not a fully functional DBMS workspace.

View solution in original post

JoshuaBixby · ‎06-07-2017

Just a side note while I look over this. Your use of OID[1,2] for groups is very confusing. In Python, square brackets are used for looking up items by an index value. On the surface, OID[1,2] looks like you are trying to slice a list named OID and return values 1 and 2. I think it would be more clear to go with something more like plain English and less like Python code, e.g., OIDs 1&2 and OIDs 3&4

JoshuaBixby · ‎06-07-2017

Barring any issues with the cursor code, you weren't constructing the ORDER BY clause correctly. Try the following and see if it gives you expected results:

case_fields = ["El_Season"]
max_field = ["COUNT_OccProb", "prob_code"]
sql_orderby = "ORDER BY " + ", ".join("{} DESC".format(field) for field in case_fields + max_field)


with arcpy.da.UpdateCursor(statistics, "*", sql_clause=(None, sql_orderby)) as cursor:
    case_func = itemgetter(*(cursor.fields.index(fld) for fld in case_fields))
    for key, group in groupby(cursor, case_func):
          next(group)
          for extra in group:
            print extra # currently have it printing extra instead of deleting for testing purposes‍‍‍‍‍‍‍‍‍‍‍

MollyMoore · ‎06-07-2017

This is still giving the same result and returning the last record per group according to OID instead of desired sorted fields.

JoshuaBixby · ‎06-07-2017

Before involving groupby or any other itertools, maybe just print out some statements with a search cursor using the same parameters to see if the sorting from the DBMS engine is what you expect. If so, then you can dive into the Python code.

What storage format are you working with? Shape files, file geodatabase, enterprise geodatabase? Can you upload a small sample/subet of your data?

MollyMoore · ‎06-07-2017

I am working with a table within a file geodatabase. The field types by which I am trying to sort are long and short. I investigated with just the search cursor and printed out the ordered rows and could not get it to order properly.

Here is a link to a zipped folder with the geodatabase that contains the table: https://app.box.com/s/qhgshw0wdld4v9b49rp4m7ub7mkk0sqb

Here is a picture of the table:

The outcome that I would like to achieve is that for each grouping of identical El_Season entries, I would like to delete that which has a lower COUNT_OccProb value. If the COUNT_OccProb values are the same, as is the case for OIDs 3 and 4, I would like to delete the record with the lower prob_code value.

At this point, I cannot even get the cursor to sort by one of the fields and I am not sure why.

JoshuaBixby · ‎06-07-2017

Using your code and data, I get:

>>> case_fields = ["El_Season"]
>>> max_field = ["COUNT_OccProb", "prob_code"]
>>> sql_orderby = "ORDER BY " + ", ".join("{} DESC".format(field) for field in case_fields + max_field)
>>> 
>>> with arcpy.da.SearchCursor(r'D:\tmp\Data.gdb\statistics',
...                            "*",
...                            sql_clause = (None, sql_orderby)) as cur:
...     for row in cur:
...         print(row)
...         
(14, u'IILEPK4060_y', 106, 1)
(15, u'IILEPK4060_y', 1, 2)
(12, u'ARADE01010_y', 77, 1)
(13, u'ARADE01010_y', 32, 2)
(10, u'AMAFF08100_y', 73, 1)
(11, u'AMAFF08100_y', 21, 2)
(9, u'AFCHA05030_y', 198, 2)
(8, u'ABPBX45040_b', 1327, 2)
(6, u'ABPBX03240_b', 562, 1)
(7, u'ABPBX03240_b', 96, 2)
(5, u'ABPBJ19010_b', 1327, 2)
(4, u'ABNKD06020_b', 692, 2)
(3, u'ABNKD06020_b', 692, 1)
(1, u'AAAAD01010_y', 89, 1)
(2, u'AAAAD01010_y', 30, 2)
>>> ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

What results do you get? The sorting in the printout above looks correct to me. There is only one case, OIDs 3 &4, where "EL_Season" and "COUNT_OccProb" are the same, and in this case the higher "prob_code" is first.

MollyMoore · ‎06-08-2017

Interesting. I got the same list as you when I ran the code on the same table.

Originally, I was creating the table from a selection and some joins and storing it in the in_memory workspace. When I do this, it does not properly order the records with the same code. However, if I export it anywhere else, it seems to perform correctly. I can make a copy within the in_memory workspace, make a table view, or export to another defined file geodatabase and it works fine.

So, I have isolated the issue, but do you happen to know why this might be happening?

JoshuaBixby · ‎06-08-2017

From Using in-memory workspace—Help | ArcGIS Desktop (see orange text):

The following considerations must be made in deciding to write output to the in-memory workspace:
Caution:
Data written to the in-memory workspace is temporary and will be deleted when the application is closed.
Tables, feature classes, and rasters can be written to the in-memory workspace.
The in-memory workspace does not support extended geodatabase elements such as subtypes, domains, representations, topologies, geometric networks, and network datasets.
Feature datasets or folders cannot be created in the in-memory workspace.

From SearchCursor—Help | ArcGIS Desktop :

An SQL postfix clause is positioned in the second position and will be appended to the SELECT statement, following the where clause. The SQL postfix clause is most commonly used for clauses such as ORDER BY.
Note:
DISTINCT, ORDER BY, and ALL are only supported when working with databases. They are not supported by other data sources (such as dBASE or INFO tables).
TOP is only supported by SQL Server and MS Access databases.

Although it isn't explicitly stated, SQL postfix clauses aren't really supported with in-memory workspaces. Although tables, feature classes, and other data types can be stored in-memory, the in-memory workspace is not a fully functional DBMS workspace.

MollyMoore · ‎06-08-2017

Thank you for working through this and for your help. I knew that there were some limitations of the in_memory workspace, but was not aware that the SQL postfix clauses were one of them. This is helpful.