I have a script that uses the sql orderby parameter to sort my arcpy cursors by a user defined field or fields and deletes duplicates, but keeps the duplicate with the highest value. I'd like to be able to sort by geometry to identify/group features that are spatial duplicates and delete those that are duplicates (but keeping the duplicate with the highest value in another chosen field which is why I can't just use the Delete Identical tool).
I have tried several ways of doing this, but as I understand, shape tokens such as SHAPE@, SHAPE@WKT, etc. cannot be used in the sql clause. Is there another way to do this or another workaround? I also tried to add a field and fill with the SHAPE@WKT, but the update cursor did not fill the field.
Here is the code I've got that works for other fields, other than the geometry field:
import arcpy
from itertools import groupby
from operator import itemgetter
input_layer = arcpy.GetParameterAsText(0)
case_fields = arcpy.GetParameterAsText(1)
max_field = arcpy.GetParameterAsText(2)
case_fields = case_fields.split(";")
case_fields = [str(x) for x in case_fields]
sql_orderby = "ORDER BY {}, {} DESC".format(",".join(case_fields), max_field)
with arcpy.da.UpdateCursor(input_layer, "*", sql_clause=(None, sql_orderby)) as cursor:
case_func = itemgetter(*(cursor.fields.index(fld) for fld in case_fields))
for key, group in groupby(cursor, case_func):
next(group)
for extra in group:
cursor.deleteRow()
Stepping back from any code, what does it mean to "sort by geometry?"
Have you looked at using Find Identical—Data Management toolbox | ArcGIS Desktop as part of a 2-step process to find spatial duplicates and then remove all but the one you want to keep?
You mention adding a field and filling it with SHAPE@WKT but nothing got filled, something seems off there. Adding a geometry WKT representation to a text field is pretty straightforward, what code did you use? Unless your geometries are quite basic, the WKT representation can get big quick, which is a separate issue.
If you are working with file geodatabases, you cannot use any geometry properties in a SQL ORDER BY clause. If you are using an enterprise geodatabase, that opens the door to other options.
If your data are points or a centroid of the features is a good proxy for shape, this workflow might help:
Add a field, text(12) named 'Geohash'.
Calculate values into the field with a little Python.
with arcpy.da.UpdateCursor('Ontario511_Layer_Sort',['shape@','Geohash']) as cursor:
for row in cursor:
shp = row[0].projectAs(arcpy.SpatialReference(4326))
gh = shp.getGeohash(12)
cursor.updateRow([row[0],gh])
This gives you an attribute to work with, identical geohashes are identical points.