Why sql_clause is not respected in da.SearchCursor?

AdamBorczyk · ‎11-19-2017

I have a shapefile where I have polylines and their fields are:

- length

- area into which they fall in

Now I need 3 longest lines from each area. I have 3 areas and 50 polylines, so I should get 3*3=9 records as a result. What I try to do is:

import arcpy

myFile = somePath
sql = ('TOP 3', 'ORDER BY "length" DESC GROUP BY "area_id" ')

cursor = arcpy.da.SearchCursor(myFile , "*", None, None, False, sql)

with cursor:
    for r in cursor:
        print r‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I get all of 50 records here. The same happens when I set `sql` to just `('TOP 1', None) or anything else.

Is the syntax wrong here?

RandyBurton · ‎11-20-2017

Since you don't know the specific glac_id, it might work to read the data from a SearchCursor into an array or dictionary and process the results from there. Untested code:

sourceFC = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"  
  
sourceFieldsList = ["glac_id", "length"]

# Use list comprehension to build a dictionary from a da SearchCursor  
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}  

for k, v in valueDict.items():
    print k, max(v)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

RandyBurton · ‎11-19-2017

Are you using a file geodatabase? Or an SQL server? Some commands do not work with a file geodatabases.

TOP is only supported by SQL Server and MS Access databases.

See: SearchCursor

AdamBorczyk · ‎11-19-2017

Thanks. I am iterating over a simple shapefile, and if I stop using TOP and change clause to eg. `sql = (None, 'ORDER BY "length" ')`, I still get every record.

RandyBurton · ‎11-19-2017

Perhaps something like (to get only first row):

OID = arcpy.da.SearchCursor(myFile, ["OBJECTID","length"],
                              where_clause = "area_id = 'some_code'",
                              sql_clause = (None, 'ORDER BY length DESC')
                              ).next()[0] # use [0] to return just OBJECTID, remove for tuple‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JoshuaBixby · ‎11-20-2017

As rvburton‌ points out, TOP is not supported with shape files. The expression, ORDER BY "length", won't limit the number of records returned, it just sorts them. If TOP isn't supported and ORDER BY only sorts records, it is expected the cursor will return all of the records. That said, you can use Python to grab the first three records, representing the longest three lengths by area, or are you wanting/hoping for an SQL only solution?

AdamBorczyk · ‎11-20-2017

Thanks Joshua Bixby and Randy Burton, I know how TOP and ORDER BY should work, but the point here is that none of example queries in `sql_clause` work, I couldn't get any ORDER BY or GROUP BY to work, or anything else. This is the code I tried right now:

import arcpy
arcpy.env.overwriteOutput=True

testVector = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"
fields = ["glac_id", "length"]
sql = (None, 'ORDER BY length DESC')

# One method (get ordered list)
with arcpy.da.SearchCursor(testVector, fields, None, None, False, sql) as cursor:
     for r in cursor:
        print r

# Other method (just give me longest one)
OID = arcpy.da.SearchCursor(testVector, fields, sql_clause = sql).next()
print OID‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

With above I always get unsorted values from the table. Even if 'length' was a string (which is not, it's a float), it is not ordered in any way, I get rows in the same order as they are in attribute table in ArcMap. There are no duplicate columns in the file obviously, no Nulls, nothing special really. I also tried ordering by SHAPE@LENGTH with no luck (same result again). Path to file is correct (I've just triple-checked it), column names are fine.

I also tried with a test WHERE clause: 'length > 100.0' with no luck again. Very same results.

I hoped this was a simple syntax error I just couldn't spot, but it looks like a bigger issue now.

RandyBurton · ‎11-20-2017

In your original question, you were wanting to "group by" area_id. My thoughts were to use the 3 area_id values to feed the where_clause in 3 SearchCursors. You would use only the first row of each search.

AdamBorczyk · ‎11-20-2017

This is correct and it would be the next step I did, but I cannot even get to this point, because ArcPy doesn't honour any ORDER BY clause. What more, trying to put `'glac_id = "G012039E78878N"'` as `WHERE` clause doesn't modify results at all either.

RandyBurton · ‎11-20-2017

Try this for the where clause (double quotes around entire where clause, single quotes only around the id code):

where_clause = "glac_id = 'G012039E78878N'",‍‍‍

AdamBorczyk · ‎11-20-2017

This one works! Partly though. I can get correct result if I specify where_clause for a given area, but not a general highest result. When I don't know exact area IDs, this method will be hard to use. Without specifying WHERE, the sql_clause does not give any effect.