Why sql_clause is not respected in da.SearchCursor?

AdamBorczyk · ‎11-19-2017

I have a shapefile where I have polylines and their fields are:

- length

- area into which they fall in

Now I need 3 longest lines from each area. I have 3 areas and 50 polylines, so I should get 3*3=9 records as a result. What I try to do is:

import arcpy

myFile = somePath
sql = ('TOP 3', 'ORDER BY "length" DESC GROUP BY "area_id" ')

cursor = arcpy.da.SearchCursor(myFile , "*", None, None, False, sql)

with cursor:
    for r in cursor:
        print r‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I get all of 50 records here. The same happens when I set `sql` to just `('TOP 1', None) or anything else.

Is the syntax wrong here?

RandyBurton · ‎11-20-2017

Since you don't know the specific glac_id, it might work to read the data from a SearchCursor into an array or dictionary and process the results from there. Untested code:

sourceFC = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"  
  
sourceFieldsList = ["glac_id", "length"]

# Use list comprehension to build a dictionary from a da SearchCursor  
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}  

for k, v in valueDict.items():
    print k, max(v)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

RandyBurton · ‎11-20-2017

Try this one:

import arcpy
arcpy.env.overwriteOutput=True

testVector = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"

result = arcpy.da.SearchCursor(testVector, ["glac_id","length"],
                               where_clause = "glac_id = 'G012039E78878N'",
                               sql_clause = (None, 'ORDER BY length DESC')
                               ).next()

print result[0] # glac_id
print result[1] # length ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

AdamBorczyk · ‎11-20-2017

Thank you for your code. I was too fast though. It appears that your and my code correctly use"where" clause, but totally omit "sql_clause". With the code you supplied, when I iterate over "result" (removing .next() beforehand) I get all records belonging to my area ID, but unsorted. Maybe sql_clause needs some more parameters which aren't documented that well?

JamesMacKay3 · ‎11-20-2017

From the doc:

"DISTINCT, ORDER BY, and ALL are only supported when working with databases. They are not supported by other data sources (such as dBASE or INFO tables)."

Since shapefiles use dBASE tables to store attributes, ORDER BY won't work on shapefiles.

JamesCrandall · ‎11-20-2017

Not sure what you want to do with the results, but I think this gets close to the desired output.

import pandas
import numpy as np

fc = r'C:\somefeatureclass'
fields = ['area_id','length']
nparr = arcpy.da.FeatureClassToNumPyArray(fc, fields)
df = pandas.DataFrame(nparr,columns=fields)
dfgrouped = df.groupby('area_id').head(3).reset_index(drop=True)

print dfgrouped.values‍‍‍‍‍‍‍‍‍‍

RandyBurton · ‎11-20-2017

I think a sort is required between line 7 and 8. Interesting code, James. Thanks for sharing.

# line 7
df = df.sort(['area_id','length'],ascending=[True,False])
# line 8‍‍‍‍‍‍‍‍‍‍‍‍

JamesCrandall · ‎11-21-2017

Depending upon what the use of the result is, it can be converted to a dictionary with one more line:

dfgrouped = dfgrouped.sort(['area_id,'length'],ascending=[True,False])
dict = dfgrouped.set_index(area_id').T.to_dict()‍‍

AdamBorczyk · ‎11-20-2017

Thank you everyone for your help. I missed the part of docs saying that ORDER BY won't work with shapefiles. Fortunately a where clause works, so I can get my output in a little bit longer way. I just have to iterate over entire cursor and find max (or top 3) from the results.

JamesCrandall · ‎11-20-2017

I still don't see how you are getting the top 3. I'd like to know too so that I can choose between my solution above and a strictly arcpy approach.

AdamBorczyk · ‎11-20-2017

This is my code, I hope it helps you James Crandall‌

    

    import arcpy
    from collections import defaultdict

    vector = 'yourShapefile.shp'

    # glac_id is my area ID, FID is the very first column in ArcMap attribute table and I calculate length of the shape on the fly
    fields = ["glac_id", "FID", "SHAPE@LENGTH"]
    selections = defaultdict(dict)
    top3 = defaultdict(list)

    with arcpy.da.SearchCursor(vector, fields) as cursor:
        for row in cursor:
            selections[row[0]][row[1]] = row[2]

    for i in selections.keys():
        # I take 3 last items (3 biggest) or sorted list of values. I could also reverse it and get 3 first ones.
        top3[i] = sorted(selections[i], key=selections[i].get)[-3:]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The if I print top3, it will give me:

defaultdict(<type 'list'>, {u'G012161E78870N': [41, 46, 47], u'G011895E78886N': [5, 3, 18], u'G012039E78878N': [44, 42, 17]})‍

which are FIDs of 3 longest lines per area. Now all you have to do is extract lines by ID from original file (or do whatever you want).