Why sql_clause is not respected in da.SearchCursor?

2684
21
Jump to solution
11-19-2017 11:45 AM
AdamBorczyk
New Contributor III

I have a shapefile where I have polylines and their fields are:

- length

- area into which they fall in

Now I need 3 longest lines from each area. I have 3 areas and 50 polylines, so I should get 3*3=9 records as a result. What I try to do is:

import arcpy

myFile = somePath
sql = ('TOP 3', 'ORDER BY "length" DESC GROUP BY "area_id" ')

cursor = arcpy.da.SearchCursor(myFile , "*", None, None, False, sql)

with cursor:
    for r in cursor:
        print r‍‍‍‍‍‍‍

I get all of 50 records here. The same happens when I set `sql` to just `('TOP 1', None) or anything else.

Is the syntax wrong here?

0 Kudos
21 Replies
RandyBurton
MVP Alum

Since you don't know the specific glac_id, it might work to read the data from a SearchCursor into an array or dictionary and process the results from there.  Untested code:

sourceFC = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"  
  
sourceFieldsList = ["glac_id", "length"]

# Use list comprehension to build a dictionary from a da SearchCursor  
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}  

for k, v in valueDict.items():
    print k, max(v)‍‍‍‍‍‍‍‍‍
RandyBurton
MVP Alum

Try this one:

import arcpy
arcpy.env.overwriteOutput=True

testVector = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"

result = arcpy.da.SearchCursor(testVector, ["glac_id","length"],
                               where_clause = "glac_id = 'G012039E78878N'",
                               sql_clause = (None, 'ORDER BY length DESC')
                               ).next()

print result[0] # glac_id
print result[1] # length ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
AdamBorczyk
New Contributor III

Thank you for your code. I was too fast though. It appears that your and my code correctly use"where" clause, but totally omit "sql_clause". With the code you supplied, when I iterate over "result" (removing .next() beforehand) I get all records belonging to my area ID, but unsorted. Maybe sql_clause needs some more parameters which aren't documented that well?

0 Kudos
JamesMacKay3
Occasional Contributor

From the doc:

"DISTINCT, ORDER BY, and ALL are only supported when working with databases. They are not supported by other data sources (such as dBASE or INFO tables)."

Since shapefiles use dBASE tables to store attributes, ORDER BY won't work on shapefiles.

JamesCrandall
MVP Frequent Contributor

Not sure what you want to do with the results, but I think this gets close to the desired output.

import pandas
import numpy as np

fc = r'C:\somefeatureclass'
fields = ['area_id','length']
nparr = arcpy.da.FeatureClassToNumPyArray(fc, fields)
df = pandas.DataFrame(nparr,columns=fields)
dfgrouped = df.groupby('area_id').head(3).reset_index(drop=True)

print dfgrouped.values
RandyBurton
MVP Alum

I think a sort is required between line 7 and 8.  Interesting code, James.  Thanks for sharing.

# line 7
df = df.sort(['area_id','length'],ascending=[True,False])
# line 8‍‍‍‍‍‍‍‍‍
JamesCrandall
MVP Frequent Contributor

Depending upon what the use of the result is, it can be converted to a dictionary with one more line:

dfgrouped = dfgrouped.sort(['area_id,'length'],ascending=[True,False])
dict = dfgrouped.set_index(area_id').T.to_dict()
AdamBorczyk
New Contributor III

Thank you everyone for your help. I missed the part of docs saying that ORDER BY won't work with shapefiles. Fortunately a where clause works, so I can get my output in a little bit longer way. I just have to iterate over entire cursor and find max (or top 3) from the results.

0 Kudos
JamesCrandall
MVP Frequent Contributor

I still don't see how you are getting the top 3.  I'd like to know too so that I can choose between my solution above and a strictly arcpy approach.

0 Kudos
AdamBorczyk
New Contributor III

This is my code, I hope it helps you James Crandall

    

    import arcpy
    from collections import defaultdict

    vector = 'yourShapefile.shp'

    # glac_id is my area ID, FID is the very first column in ArcMap attribute table and I calculate length of the shape on the fly
    fields = ["glac_id", "FID", "SHAPE@LENGTH"]
    selections = defaultdict(dict)
    top3 = defaultdict(list)

    with arcpy.da.SearchCursor(vector, fields) as cursor:
        for row in cursor:
            selections[row[0]][row[1]] = row[2]

    for i in selections.keys():
        # I take 3 last items (3 biggest) or sorted list of values. I could also reverse it and get 3 first ones.
        top3[i] = sorted(selections[i], key=selections[i].get)[-3:]

The if I print top3, it will give me:

defaultdict(<type 'list'>, {u'G012161E78870N': [41, 46, 47], u'G011895E78886N': [5, 3, 18], u'G012039E78878N': [44, 42, 17]})

which are FIDs of 3 longest lines per area. Now all you have to do is extract lines by ID from original file (or do whatever you want).