using join by in arcpy.da.searchcursor() - what is the right syntax?

yonatanrubinstein · ‎04-08-2014

Hi, I'm trying to run a code that uses a "join by statement" inside a searchcursor (it was done to imitate the join by inside the iterate features tool in the modelbuilder). I've tried this code (as a trial before implementing it on the real script), but I can't seem to find the right syntax or any documentation.

import arcpy  fc = "D:\\Projects\\proj1\\SumsOfRoutes.gdb\\all_routes_1stBatch" field = "r_Identity" field2 = "Shape_Length" ##sql = "(None, 'GROUP BY \"r_Identity\"')" sql = "(None,'Group BY r_Identity')"   fields=["*"] cursor = arcpy.da.SearchCursor(fc,fields,sql_clause=sql)   # print r_Identity and Shape_Length of each feature in feature class for row in cursor:     print(str(int(row.getValue(field))) + ":  " + str(row.getValue(field2)) + " meters")

Does anybody know how it should be written?

EDIT: for the script above, I get the following error:

TypeError: 'sql_clause' should be sequense of 2 strings(prefix, postfix)
Thanks.

JamesCrandall · ‎04-08-2014

Update:

This must be a bug or something because I don't see how this can be correct. According to Shaun's reply in this thread: http://forums.arcgis.com/threads/77876-Group-By-functionality-of-da.cursors-sql_clause the suggestion is to include the "ObjectID" field in the group by clause to overcome the error you are experiencing. Here's my working code:

 fc = r'H:\Documents\ArcGIS\Default.gdb\RainGauges' fields = ('Data_Type')  sql=[None, "GROUP BY Data_Type, ObjectID"] cursor = arcpy.da.SearchCursor(fc, fields,sql_clause=sql)  for row in cursor:     print row[0]

HOWEVER, this completely defeats the ability to actually group rows because the ObjectID is a unique value!! lol... Maybe I am simply misunderstanding here, but that is exactly what is happening in my test data. That is, it doesn't matter if real/actual groups exist in my select fields, they will NEVER group correctly because I must include the ObjectID in the group by clause (it is unique values).

I am happy to be corrected on this one!

View solution in original post

JoshuaChisholm · ‎04-08-2014

Hello Yonatan,

I think sql_clause needs to be passed as a tuple. Try removing the quotes:

sql = (None,'GROUP BY r_Identity')

On a side note, you could probably increase the efficiency of your script by only pulling the fields you need for the da.SearchCursor. Like this:

fields=[field,field2]

Let me know how it goes!

JamesCrandall · ‎04-08-2014

Here's my guess...

Change:

sql = (None,'GROUP BY r_Identity')

To:

sql = [None,"GROUP BY r_Identity"]

yonatanrubinstein · ‎04-08-2014

Hi Joshua,

Thanks for the quick response.

1. As far as I Know, the sql_clause is a string. As evidence, when I do as you suggested and write

sql = (None,'GROUP BY r_Identity')

I get the following error:

TypeError: 'where_clause' is not a string

, so that is why I have the quotation marks. Any Ideas?

2. I know listing only 2 fields will increase the efficiency of the cursor, but will it enable me to use other field in the layer, later on?

JamesCrandall · ‎04-08-2014

I just noticed you have this:

fields=["*"]

This is not allowable in a GROUP BY statement. That is, you could not write it as:

SELECT *
FROM table1
GROUP BY table1.field1

You must specify the fields you wish to include in the group by too!!! Like this:

SELECT table1.field1, table1.field2
FROM table1
GROUP BY table1.field1, table1.field2

JamesCrandall · ‎04-08-2014

Update:

This must be a bug or something because I don't see how this can be correct. According to Shaun's reply in this thread: http://forums.arcgis.com/threads/77876-Group-By-functionality-of-da.cursors-sql_clause the suggestion is to include the "ObjectID" field in the group by clause to overcome the error you are experiencing. Here's my working code:

 fc = r'H:\Documents\ArcGIS\Default.gdb\RainGauges' fields = ('Data_Type')  sql=[None, "GROUP BY Data_Type, ObjectID"] cursor = arcpy.da.SearchCursor(fc, fields,sql_clause=sql)  for row in cursor:     print row[0]

HOWEVER, this completely defeats the ability to actually group rows because the ObjectID is a unique value!! lol... Maybe I am simply misunderstanding here, but that is exactly what is happening in my test data. That is, it doesn't matter if real/actual groups exist in my select fields, they will NEVER group correctly because I must include the ObjectID in the group by clause (it is unique values).

I am happy to be corrected on this one!

yonatanrubinstein · ‎04-08-2014

OK, James,

I tried to rewrite the script by your 2 suggestions:

import arcpy

fc = "D:\\Projects\\proj1\\SumsOfRoutes.gdb\\all_routes_1stBatch"
field1 = "r_Identity"
field2 = "Shape_Length"

sql = [None,'Group BY r_Identity']


fields=[field1,field2]
cursor = arcpy.da.SearchCursor(fc,fields,sql_clause=sql)


# print r_Identity and Shape_Length of each feature in feature class
for row in cursor:
    print str(row)

I get the following error:

RuntimeError: An invalid SQL stament was used. [SELECT r_Identity, Shape_Length, OBJECTID FROM all_routes_1stBatch GROUP BY r_Identity]

Any Ideas as to what's wrong?

and the question I asked Joshua: I know listing only 2 fields will increase the efficiency of the cursor, but will it enable me to use other fields in the layer later on?

JamesCrandall · ‎04-08-2014

OK, James,

I tried to rewrite the script by your 2 suggestions:
import arcpy

fc = "D:\\Projects\\proj1\\SumsOfRoutes.gdb\\all_routes_1stBatch"
field1 = "r_Identity"
field2 = "Shape_Length"

sql = [None,'Group BY r_Identity']


fields=[field1,field2]
cursor = arcpy.da.SearchCursor(fc,fields,sql_clause=sql)


# print r_Identity and Shape_Length of each feature in feature class
for row in cursor:
    print str(row) 
I get the following error:
RuntimeError: An invalid SQL stament was used. [SELECT r_Identity, Shape_Length, OBJECTID FROM all_routes_1stBatch GROUP BY r_Identity] 
Any Ideas as to what's wrong?

and the question I asked Joshua: I know listing only 2 fields will increase the efficiency of the cursor, but will it enable me to use other fields in the layer later on?

You cannot include Shape_Length (see bold above) in your select because it is not included in your GROUP BY portion. Remove it and see if it at least completes without error.

yonatanrubinstein · ‎04-08-2014

Update:

This must be a bug or something because I don't see how this can be correct. According to Shaun's reply in this thread: http://forums.arcgis.com/threads/77876-Group-By-functionality-of-da.cursors-sql_clause the suggestion is to include the "ObjectID" field in the group by clause to overcome the error you are experiencing. Here's my working code:
fc = r'H:\Documents\ArcGIS\Default.gdb\RainGauges'
fields = ('Data_Type') 
sql=[None, "GROUP BY Data_Type, ObjectID"]
cursor = arcpy.da.SearchCursor(fc, fields,sql_clause=sql)

for row in cursor:
    print row[0]
HOWEVER, this completely defeats the ability to actually group rows because the ObjectID is a unique value!! lol... Maybe I am simply misunderstanding here, but that is exactly what is happening in my test data. That is, it doesn't matter if real/actual groups exist in my select fields, they will NEVER group correctly because I must include the ObjectID in the group by clause (it is unique values).

I am happy to be corrected on this one!

I get the same runtimeerror as before... 😞

JamesCrandall · ‎04-08-2014

I get the same runtimeerror as before... 😞

Is "r_Identity" the actual field name?

Make sure the spellng is correct. I get the same error in my working example if it is incorrectly spelled.