arcpy.da.SearchCursor - sqlprefix

ManikMohandas · ‎09-08-2014

Does the sql prefix - DISTINCT work well on SDE tables.

http://resources.arcgis.com/en/help/main/10.1/index.html#//018w00000011000000

Syntax SearchCursor (in_table, field_names, {where_clause}, {spatial_reference}, {explode_to_points}, {sql_clause})

An SQL prefix clause is positioned in the first position and will be inserted between the SELECT keyword and the SELECT COLUMN LIST. The SQL prefix clause is most commonly used for clauses such as DISTINCT or ALL.

My following example is resulting in all rows, not distinct rows. rs = arcpy.da.SearchCursor(sTableName, (sSelectField1, sSelectField2), sSqlExp, None, None, ("DISTINCT", sSqlPostfix))

What elese could I do to get truely distinct records only.
I store the values in a list and are they any function to get unique values in a list. Example [1,2,2,3,3,3,5,6,7] to [1,2,3,5,6,7]. Also set(xyz) won't work because I'm using a list, whereas set(xyz) works on tuples.

XanderBakker · ‎09-09-2014

Some small example on creating a unique list of values in a field:

# create a unique list of the values in a field in a table or featureclass
import arcpy
FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"
fld_name1 = "COLUMNA"
unique_list = list(set(r[0] for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1))))
unique_list.sort()
print "\n".join(unique_list)
print "\n#####\n"

On line 5 a list is created using list comprehensions, which is converted to a set (unique values) and parsed back to a list

# using a combination of 2 columns
import arcpy
FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"
fld_name1 = "COLUMNA"
fld_name2 = "ID_LAYERREF"
unique_list = list(set("{0},{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2))))
print "\n".join(unique_list)
print "\n#####\n"

On line 6 the list is created from the combination of 2 fields

# including a where clause
import arcpy
FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"
fld_name1 = "COLUMNA"
fld_name2 = "ID_LAYERREF"
fld_search = "COLUMNA"
search_value = "OBJECTID"
where = "{0} = '{1}'".format(arcpy.AddFieldDelimiters(FC_or_TBL, fld_search), search_value)
unique_list = list(set("{0}_{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2, fld_search), where)))
print "\n".join(unique_list)
print "\n#####\n"

In this case an additional where clause is used to limited the results. As James already showed, Pandas is a powerful library. If you have it, use it. If you don't and you don't want to install it, the above might be an alternative.

Kind regards,

Xander

View solution in original post

JamesCrandall · ‎09-09-2014

I doubt you can perform GROUP BY clause in any arcpy SearchCursor (I can't remember where I saw this, but I think this is the case). However, you can:

1. Depending upon the Database type, you can issue your SQL in the python code. You will need the appropriate library (cx_Oracle, pyodbc, pysmssql, etc).

2. These typically put results into a "cursor" (not an arcpy cursor). So you will then have to parse this into your own list/array.

3. Take the list and either use it as it or perform an additional conversion to NumPy.

4. Or you can utilize pandas library to convert from your list and perform just about any grouping you'd want.


import cx_Oracle


import numpy


import pandas

con = cx_Oracle.connect('theuser', 'thepass', 'your DB alias on your TNSNAMES.ORA file ')
cur = con.cursor()
if cur.execute('select fld1, fld2 from WHERE fld1 = 'blah' GROUP BY fld1, fld2'):
    print "finally, it works!!!"
else:
    print "facepalm"


 


datArray = []  


cxRows = cursor.fetchall()


for cxRow in cxRows:


   datArray.append(cxRow)


con.close()


 


#convert the array to a pandas DataFrame


dbDF = DataFrame(datArray, columns=['fld1', 'fld2'])


 


#to get unique values from any pandas datframe


uniquevals = numpy.unique(dbDF['fld1'])

ManikMohandas · ‎09-09-2014

I get error for these, because I dont have those modules:

import cx_Oracle - Is there an equivalent for SQL Server, our SDE runs of it.
import pandas - Is it an open source freeware? Just to be sure.

JamesCrandall · ‎09-09-2014

Looks like pyodbc will connect to SQL Server sql server - MSSQL in python 2.7 - Stack Overflow

Pandas home Python Data Analysis Library — pandas: Python Data Analysis Library

I am just a developer and don't have experience getting these installed, so you will have to figure that out.

XanderBakker · ‎09-09-2014

Some small example on creating a unique list of values in a field:

# create a unique list of the values in a field in a table or featureclass
import arcpy
FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"
fld_name1 = "COLUMNA"
unique_list = list(set(r[0] for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1))))
unique_list.sort()
print "\n".join(unique_list)
print "\n#####\n"

On line 5 a list is created using list comprehensions, which is converted to a set (unique values) and parsed back to a list

# using a combination of 2 columns
import arcpy
FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"
fld_name1 = "COLUMNA"
fld_name2 = "ID_LAYERREF"
unique_list = list(set("{0},{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2))))
print "\n".join(unique_list)
print "\n#####\n"

On line 6 the list is created from the combination of 2 fields

# including a where clause
import arcpy
FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"
fld_name1 = "COLUMNA"
fld_name2 = "ID_LAYERREF"
fld_search = "COLUMNA"
search_value = "OBJECTID"
where = "{0} = '{1}'".format(arcpy.AddFieldDelimiters(FC_or_TBL, fld_search), search_value)
unique_list = list(set("{0}_{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2, fld_search), where)))
print "\n".join(unique_list)
print "\n#####\n"

In this case an additional where clause is used to limited the results. As James already showed, Pandas is a powerful library. If you have it, use it. If you don't and you don't want to install it, the above might be an alternative.

Kind regards,

Xander

JamesCrandall · ‎09-09-2014

Thanks for the example Xander --- it's good to see straight-forward solutions. While it's frustrating to be unable to issue more complex SQL on arcpy/GIS side of things, I'd even prefer to completely remove the SQL out of the code I posted and move that to StoredProcedures/PL-SQL packages on the database instead.

As as an application and database devloper, it just makes sense to only return the rows needed rather than return everything and filter on the app side.

curtvprice · ‎09-11-2014

As as an application and database devloper, it just makes sense to only return the rows needed rather than return everything and filter on the app side.

Isn't the main way this is supported with ArcSDE databases using the Make Query Table tool? I think that's the only way to access fully native SQL queries off the geodatabase.

arcpy.da.SearchCursor is a huge improvement. I've heard pandas is really really fast (especially for joins) so I'm hoping Esri is considering including it in their default python distro.

JamesCrandall · ‎09-11-2014

I can't seem to locate the exact thread I participated in but it was about attempting to setup a Make Query Table with multiple fields and include a GROUP BY statement in the SQL. It wouldn't work from what I can remember, or it would execute but it doesn't do the grouping as expected.

Perhaps I am misunderstanding the QueryTable!

JamesCrandall · ‎09-12-2014

Curtis,

I am using Pandas quite extensively and I can confirm it offers a lot of value for table/list operations! However, in the context of this thread, it just doesn't make sense because the idea of issuing SQL against the database is to return only those rose that meet the query. That is, I wouldn't want to return 1 million rows just to process them with Pandas functions.

RobertBurke · ‎09-09-2014

Here is a simple sql clause that I use to get an ordered unique list of values from the Speed field in the SpeedLimits table. Distinct only returns unique values, Order by with ASC for ascending, sorts the rows. Note my Speed field is a test type and not numeric.

uniqueList = []

with arcpy.da.SearchCursor("SpeedLimits", ["Speed"], sql_clause=("DISTINCT", 'ORDER BY Speed ASC') ) as rows:

for row in rows:

uniqueList.append(row[0])

stringFilter.list = uniqueList

Rob Burke