Sort by shape token in Arcpy cursors?

10-02-2018 08:30 AM
New Contributor III

I have a script that uses the sql orderby parameter to sort my arcpy cursors by a user defined field or fields and deletes duplicates, but keeps the duplicate with the highest value. I'd like to be able to sort by geometry to identify/group features that are spatial duplicates and delete those that are duplicates (but keeping the duplicate with the highest value in another chosen field which is why I can't just use the Delete Identical tool).

I have tried several ways of doing this, but as I understand, shape tokens such as SHAPE@, SHAPE@WKT, etc. cannot be used in the sql clause. Is there another way to do this or another workaround? I also tried to add a field and fill with the SHAPE@WKT, but the update cursor did not fill the field.

Here is the code I've got that works for other fields, other than the geometry field:

import arcpy
from itertools import groupby
from operator import itemgetter

input_layer = arcpy.GetParameterAsText(0)
case_fields = arcpy.GetParameterAsText(1)
max_field = arcpy.GetParameterAsText(2)
case_fields = case_fields.split(";")
case_fields = [str(x) for x in case_fields]

sql_orderby = "ORDER BY {}, {} DESC".format(",".join(case_fields), max_field)

with arcpy.da.UpdateCursor(input_layer, "*", sql_clause=(None, sql_orderby)) as cursor:
    case_func = itemgetter(*(cursor.fields.index(fld) for fld in case_fields))
    for key, group in groupby(cursor, case_func):
        for extra in group:
0 Kudos
2 Replies
MVP Esteemed Contributor

Stepping back from any code, what does it mean to "sort by geometry?"

Have you looked at using Find Identical—Data Management toolbox | ArcGIS Desktop as part of a 2-step process to find spatial duplicates and then remove all but the one you want to keep?

You mention adding a field and filling it with SHAPE@WKT but nothing got filled, something seems off there.  Adding a geometry WKT representation to a text field is pretty straightforward, what code did you use?  Unless your geometries are quite basic, the WKT representation can get big quick, which is a separate issue.

If you are working with file geodatabases, you cannot use any geometry properties in a SQL ORDER BY clause.  If you are using an enterprise geodatabase, that opens the door to other options.

Esri Regular Contributor

If your data are points or a centroid of the features is a good proxy for shape, this workflow might help:

Add a field, text(12) named 'Geohash'.

Calculate values into the field with a little Python.

with arcpy.da.UpdateCursor('Ontario511_Layer_Sort',['shape@','Geohash']) as cursor:
    for row in cursor:
        shp = row[0].projectAs(arcpy.SpatialReference(4326))
        gh = shp.getGeohash(12)

This gives you an attribute to work with, identical geohashes are identical points.