I have a shapefile where I have polylines and their fields are:
- length
- area into which they fall in
Now I need 3 longest lines from each area. I have 3 areas and 50 polylines, so I should get 3*3=9 records as a result. What I try to do is:
import arcpy
myFile = somePath
sql = ('TOP 3', 'ORDER BY "length" DESC GROUP BY "area_id" ')
cursor = arcpy.da.SearchCursor(myFile , "*", None, None, False, sql)
with cursor:
for r in cursor:
print r
I get all of 50 records here. The same happens when I set `sql` to just `('TOP 1', None) or anything else.
Is the syntax wrong here?
Solved! Go to Solution.
Since you don't know the specific glac_id, it might work to read the data from a SearchCursor into an array or dictionary and process the results from there. Untested code:
sourceFC = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"
sourceFieldsList = ["glac_id", "length"]
# Use list comprehension to build a dictionary from a da SearchCursor
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}
for k, v in valueDict.items():
print k, max(v)
Try this one:
import arcpy
arcpy.env.overwriteOutput=True
testVector = r"C:\Users\adam\Desktop\testarcpy\cont_int.shp"
result = arcpy.da.SearchCursor(testVector, ["glac_id","length"],
where_clause = "glac_id = 'G012039E78878N'",
sql_clause = (None, 'ORDER BY length DESC')
).next()
print result[0] # glac_id
print result[1] # length
Thank you for your code. I was too fast though. It appears that your and my code correctly use"where" clause, but totally omit "sql_clause". With the code you supplied, when I iterate over "result" (removing .next() beforehand) I get all records belonging to my area ID, but unsorted. Maybe sql_clause needs some more parameters which aren't documented that well?
From the doc:
"DISTINCT, ORDER BY, and ALL are only supported when working with databases. They are not supported by other data sources (such as dBASE or INFO tables)."
Since shapefiles use dBASE tables to store attributes, ORDER BY won't work on shapefiles.
Not sure what you want to do with the results, but I think this gets close to the desired output.
import pandas
import numpy as np
fc = r'C:\somefeatureclass'
fields = ['area_id','length']
nparr = arcpy.da.FeatureClassToNumPyArray(fc, fields)
df = pandas.DataFrame(nparr,columns=fields)
dfgrouped = df.groupby('area_id').head(3).reset_index(drop=True)
print dfgrouped.values
I think a sort is required between line 7 and 8. Interesting code, James. Thanks for sharing.
# line 7
df = df.sort(['area_id','length'],ascending=[True,False])
# line 8
Depending upon what the use of the result is, it can be converted to a dictionary with one more line:
dfgrouped = dfgrouped.sort(['area_id,'length'],ascending=[True,False])
dict = dfgrouped.set_index(area_id').T.to_dict()
Thank you everyone for your help. I missed the part of docs saying that ORDER BY won't work with shapefiles. Fortunately a where clause works, so I can get my output in a little bit longer way. I just have to iterate over entire cursor and find max (or top 3) from the results.
I still don't see how you are getting the top 3. I'd like to know too so that I can choose between my solution above and a strictly arcpy approach.
This is my code, I hope it helps you James Crandall
import arcpy
from collections import defaultdict
vector = 'yourShapefile.shp'
# glac_id is my area ID, FID is the very first column in ArcMap attribute table and I calculate length of the shape on the fly
fields = ["glac_id", "FID", "SHAPE@LENGTH"]
selections = defaultdict(dict)
top3 = defaultdict(list)
with arcpy.da.SearchCursor(vector, fields) as cursor:
for row in cursor:
selections[row[0]][row[1]] = row[2]
for i in selections.keys():
# I take 3 last items (3 biggest) or sorted list of values. I could also reverse it and get 3 first ones.
top3[i] = sorted(selections[i], key=selections[i].get)[-3:]
The if I print top3, it will give me:
defaultdict(<type 'list'>, {u'G012161E78870N': [41, 46, 47], u'G011895E78886N': [5, 3, 18], u'G012039E78878N': [44, 42, 17]})
which are FIDs of 3 longest lines per area. Now all you have to do is extract lines by ID from original file (or do whatever you want).