using join by in arcpy.da.searchcursor() - what is the right syntax?

3735
18
Jump to solution
04-08-2014 06:22 AM
yonatanrubinstein
New Contributor III
Hi, I'm trying to run a code that uses a "join by statement" inside a searchcursor (it was done to imitate the join by inside the iterate features tool in the modelbuilder). I've tried this code (as a trial before implementing it on the real script), but I can't seem to find the right syntax or any documentation.
import arcpy  fc = "D:\\Projects\\proj1\\SumsOfRoutes.gdb\\all_routes_1stBatch" field = "r_Identity" field2 = "Shape_Length" ##sql = "(None, 'GROUP BY \"r_Identity\"')" sql = "(None,'Group BY r_Identity')"   fields=["*"] cursor = arcpy.da.SearchCursor(fc,fields,sql_clause=sql)   # print r_Identity and Shape_Length of each feature in feature class for row in cursor:     print(str(int(row.getValue(field))) + ":  " + str(row.getValue(field2)) + " meters")


Does anybody know how it should be written?


EDIT: for the script above, I get the following error:

TypeError: 'sql_clause' should be sequense of 2 strings(prefix, postfix)
Thanks.
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
JamesCrandall
MVP Frequent Contributor
Update:

This must be a bug or something because I don't see how this can be correct.  According to Shaun's reply in this thread: http://forums.arcgis.com/threads/77876-Group-By-functionality-of-da.cursors-sql_clause the suggestion is to include the "ObjectID" field in the group by clause to overcome the error you are experiencing.  Here's my working code:

 fc = r'H:\Documents\ArcGIS\Default.gdb\RainGauges' fields = ('Data_Type')  sql=[None, "GROUP BY Data_Type, ObjectID"] cursor = arcpy.da.SearchCursor(fc, fields,sql_clause=sql)  for row in cursor:     print row[0] 


HOWEVER, this completely defeats the ability to actually group rows because the ObjectID is a unique value!!  lol...  Maybe I am simply misunderstanding here, but that is exactly what is happening in my test data.  That is, it doesn't matter if real/actual groups exist in my select fields, they will NEVER group correctly because I must include the ObjectID in the group by clause (it is unique values).

I am happy to be corrected on this one!

View solution in original post

0 Kudos
18 Replies
JoshuaChisholm
Occasional Contributor III
Hello Yonatan,

I think sql_clause needs to be passed as a tuple. Try removing the quotes:
sql = (None,'GROUP BY r_Identity')


On a side note, you could probably increase the efficiency of your script by only pulling the fields you need for the da.SearchCursor. Like this:
fields=[field,field2]


Let me know how it goes!
0 Kudos
JamesCrandall
MVP Frequent Contributor
Here's my guess...

Change:

sql = (None,'GROUP BY r_Identity')


To:

sql = [None,"GROUP BY r_Identity"]
0 Kudos
yonatanrubinstein
New Contributor III
Hi Joshua,

Thanks for the quick response.

1. As far as I Know, the sql_clause is a string. As evidence, when I do as you suggested and write
sql = (None,'GROUP BY r_Identity')
I get the following error:
TypeError: 'where_clause' is not a string
, so that is why I have the quotation marks. Any Ideas?

2. I know listing only 2 fields will  increase the efficiency of the cursor, but will it enable me to use other field in the layer, later on?
0 Kudos
JamesCrandall
MVP Frequent Contributor
I just noticed you have this:

fields=["*"]


This is not allowable in a GROUP BY statement.  That is, you could not write it as:

SELECT *
FROM table1
GROUP BY table1.field1

You must specify the fields you wish to include in the group by too!!!  Like this:

SELECT table1.field1, table1.field2
FROM table1
GROUP BY table1.field1, table1.field2
0 Kudos
JamesCrandall
MVP Frequent Contributor
Update:

This must be a bug or something because I don't see how this can be correct.  According to Shaun's reply in this thread: http://forums.arcgis.com/threads/77876-Group-By-functionality-of-da.cursors-sql_clause the suggestion is to include the "ObjectID" field in the group by clause to overcome the error you are experiencing.  Here's my working code:

 fc = r'H:\Documents\ArcGIS\Default.gdb\RainGauges' fields = ('Data_Type')  sql=[None, "GROUP BY Data_Type, ObjectID"] cursor = arcpy.da.SearchCursor(fc, fields,sql_clause=sql)  for row in cursor:     print row[0] 


HOWEVER, this completely defeats the ability to actually group rows because the ObjectID is a unique value!!  lol...  Maybe I am simply misunderstanding here, but that is exactly what is happening in my test data.  That is, it doesn't matter if real/actual groups exist in my select fields, they will NEVER group correctly because I must include the ObjectID in the group by clause (it is unique values).

I am happy to be corrected on this one!
0 Kudos
yonatanrubinstein
New Contributor III
OK, James,

I tried to rewrite the script by your 2 suggestions:

import arcpy

fc = "D:\\Projects\\proj1\\SumsOfRoutes.gdb\\all_routes_1stBatch"
field1 = "r_Identity"
field2 = "Shape_Length"

sql = [None,'Group BY r_Identity']


fields=[field1,field2]
cursor = arcpy.da.SearchCursor(fc,fields,sql_clause=sql)


# print r_Identity and Shape_Length of each feature in feature class
for row in cursor:
    print str(row) 


I get the following error:
RuntimeError: An invalid SQL stament was used. [SELECT r_Identity, Shape_Length, OBJECTID FROM all_routes_1stBatch GROUP BY r_Identity] 


Any Ideas as to what's wrong?

and the question I asked Joshua:  I know listing only 2 fields will increase the efficiency of the cursor, but will it enable me to use other fields in the layer later on?
0 Kudos
JamesCrandall
MVP Frequent Contributor
OK, James,

I tried to rewrite the script by your 2 suggestions:

import arcpy

fc = "D:\\Projects\\proj1\\SumsOfRoutes.gdb\\all_routes_1stBatch"
field1 = "r_Identity"
field2 = "Shape_Length"

sql = [None,'Group BY r_Identity']


fields=[field1,field2]
cursor = arcpy.da.SearchCursor(fc,fields,sql_clause=sql)


# print r_Identity and Shape_Length of each feature in feature class
for row in cursor:
    print str(row) 


I get the following error:
RuntimeError: An invalid SQL stament was used. [SELECT r_Identity, Shape_Length, OBJECTID FROM all_routes_1stBatch GROUP BY r_Identity] 


Any Ideas as to what's wrong?

and the question I asked Joshua:  I know listing only 2 fields will increase the efficiency of the cursor, but will it enable me to use other fields in the layer later on?


You cannot include Shape_Length (see bold above) in your select because it is not included in your GROUP BY portion.  Remove it and see if it at least completes without error.
0 Kudos
yonatanrubinstein
New Contributor III
Update:

This must be a bug or something because I don't see how this can be correct.  According to Shaun's reply in this thread: http://forums.arcgis.com/threads/77876-Group-By-functionality-of-da.cursors-sql_clause the suggestion is to include the "ObjectID" field in the group by clause to overcome the error you are experiencing.  Here's my working code:


fc = r'H:\Documents\ArcGIS\Default.gdb\RainGauges'
fields = ('Data_Type') 
sql=[None, "GROUP BY Data_Type, ObjectID"]
cursor = arcpy.da.SearchCursor(fc, fields,sql_clause=sql)

for row in cursor:
    print row[0]



HOWEVER, this completely defeats the ability to actually group rows because the ObjectID is a unique value!!  lol...  Maybe I am simply misunderstanding here, but that is exactly what is happening in my test data.  That is, it doesn't matter if real/actual groups exist in my select fields, they will NEVER group correctly because I must include the ObjectID in the group by clause (it is unique values).

I am happy to be corrected on this one!


I get the same runtimeerror as before... 😞
0 Kudos
JamesCrandall
MVP Frequent Contributor
I get the same runtimeerror as before... 😞


Is "r_Identity" the actual field name?

Make sure the spellng is correct. I get the same error in my working example if it is incorrectly spelled.
0 Kudos