arcpy.da.SearchCursor - sqlprefix

9807
24
Jump to solution
09-08-2014 02:20 PM
ManikMohandas
New Contributor II

Does the sql prefix - DISTINCT work well on SDE tables.

http://resources.arcgis.com/en/help/main/10.1/index.html#//018w00000011000000

Syntax SearchCursor (in_table, field_names, {where_clause}, {spatial_reference}, {explode_to_points}, {sql_clause})

An SQL prefix clause is positioned in the first position and will be inserted between the SELECT keyword and the SELECT COLUMN LIST. The SQL prefix clause is most commonly used for clauses such as DISTINCT or ALL.

My following example is resulting in all rows, not distinct rows.  rs = arcpy.da.SearchCursor(sTableName, (sSelectField1, sSelectField2), sSqlExp, None, None, ("DISTINCT", sSqlPostfix))

  1. What elese could I do to get truely distinct records only.
  2. I store the values in a list and are they any function to get unique values in a list. Example [1,2,2,3,3,3,5,6,7] to [1,2,3,5,6,7]. Also set(xyz) won't work because I'm using a list, whereas set(xyz) works on tuples.
Tags (1)
0 Kudos
1 Solution

Accepted Solutions
XanderBakker
Esri Esteemed Contributor

Some small example on creating a unique list of values in a field:

# create a unique list of the values in a field in a table or featureclass

import arcpy

FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"

fld_name1 = "COLUMNA"

unique_list = list(set(r[0] for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1))))

unique_list.sort()

print "\n".join(unique_list)

print "\n#####\n"

On line 5 a list is created using list comprehensions, which is converted to a set (unique values) and parsed back to a list

# using a combination of 2 columns

import arcpy

FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"

fld_name1 = "COLUMNA"

fld_name2 = "ID_LAYERREF"

unique_list = list(set("{0},{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2))))

print "\n".join(unique_list)

print "\n#####\n"

On line 6 the list is created from the combination of 2 fields

# including a where clause

import arcpy

FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"

fld_name1 = "COLUMNA"

fld_name2 = "ID_LAYERREF"

fld_search = "COLUMNA"

search_value = "OBJECTID"

where = "{0} = '{1}'".format(arcpy.AddFieldDelimiters(FC_or_TBL, fld_search), search_value)

unique_list = list(set("{0}_{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2, fld_search), where)))

print "\n".join(unique_list)

print "\n#####\n"

In this case an additional where clause is used to limited the results. As James already showed, Pandas is a powerful library. If you have it, use it. If you don't and you don't want to install it, the above might be an alternative.

Kind regards,

Xander

View solution in original post

24 Replies
JamesCrandall
MVP Frequent Contributor

I doubt you can perform GROUP BY clause in any arcpy SearchCursor (I can't remember where I saw this, but I think this is the case).  However, you can:

1. Depending upon the Database type, you can issue your SQL in the python code.  You will need the appropriate library (cx_Oracle, pyodbc, pysmssql, etc).

2. These typically put results into a "cursor" (not an arcpy cursor).  So you will then have to parse this into your own list/array.

3. Take the list and either use it as it or perform an additional conversion to NumPy.

4. Or you can utilize pandas library to convert from your list and perform just about any grouping you'd want.


import cx_Oracle


import numpy


import pandas

con = cx_Oracle.connect('theuser', 'thepass', 'your DB alias on your TNSNAMES.ORA file ')
cur = con.cursor()
if cur.execute('select fld1, fld2 from WHERE fld1 = 'blah' GROUP BY fld1, fld2'):
    print "finally, it works!!!"
else:
    print "facepalm"


 


datArray = []  


cxRows = cursor.fetchall()


for cxRow in cxRows:


   datArray.append(cxRow)


con.close()


 


#convert the array to a pandas DataFrame


dbDF = DataFrame(datArray, columns=['fld1', 'fld2'])


 


#to get unique values from any pandas datframe


uniquevals = numpy.unique(dbDF['fld1'])

0 Kudos
ManikMohandas
New Contributor II

I get error for these, because I dont have those modules:

  • import cx_Oracle   - Is there an equivalent for SQL Server, our SDE runs of it.
  • import pandas  - Is it an open source freeware? Just to be sure.
0 Kudos
JamesCrandall
MVP Frequent Contributor

Looks like pyodbc will connect to SQL Server sql server - MSSQL in python 2.7 - Stack Overflow

Pandas home Python Data Analysis Library — pandas: Python Data Analysis Library

I am just a developer and don't have experience getting these installed, so you will have to figure that out.

0 Kudos
XanderBakker
Esri Esteemed Contributor

Some small example on creating a unique list of values in a field:

# create a unique list of the values in a field in a table or featureclass

import arcpy

FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"

fld_name1 = "COLUMNA"

unique_list = list(set(r[0] for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1))))

unique_list.sort()

print "\n".join(unique_list)

print "\n#####\n"

On line 5 a list is created using list comprehensions, which is converted to a set (unique values) and parsed back to a list

# using a combination of 2 columns

import arcpy

FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"

fld_name1 = "COLUMNA"

fld_name2 = "ID_LAYERREF"

unique_list = list(set("{0},{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2))))

print "\n".join(unique_list)

print "\n#####\n"

On line 6 the list is created from the combination of 2 fields

# including a where clause

import arcpy

FC_or_TBL = r"D:\Xander\Genesis\Tablas LayerRef\bk_DLLO_931.gdb\AG_LAYERREF"

fld_name1 = "COLUMNA"

fld_name2 = "ID_LAYERREF"

fld_search = "COLUMNA"

search_value = "OBJECTID"

where = "{0} = '{1}'".format(arcpy.AddFieldDelimiters(FC_or_TBL, fld_search), search_value)

unique_list = list(set("{0}_{1}".format(r[0], r[1]) for r in arcpy.da.SearchCursor(FC_or_TBL, (fld_name1, fld_name2, fld_search), where)))

print "\n".join(unique_list)

print "\n#####\n"

In this case an additional where clause is used to limited the results. As James already showed, Pandas is a powerful library. If you have it, use it. If you don't and you don't want to install it, the above might be an alternative.

Kind regards,

Xander

JamesCrandall
MVP Frequent Contributor

Thanks for the example Xander --- it's good to see straight-forward solutions.  While it's frustrating to be unable to issue more complex SQL on arcpy/GIS side of things, I'd even prefer to completely remove the SQL out of the code I posted and move that to StoredProcedures/PL-SQL packages on the database instead.

As as an application and database devloper, it just makes sense to only return the rows needed rather than return everything and filter on the app side.

0 Kudos
curtvprice
MVP Esteemed Contributor
As as an application and database devloper, it just makes sense to only return the rows needed rather than return everything and filter on the app side.

Isn't the main way this is supported with ArcSDE databases using the Make Query Table tool? I think that's the only way to access fully native SQL queries off the geodatabase.

arcpy.da.SearchCursor is a huge improvement. I've heard pandas is really really fast (especially for joins) so I'm hoping Esri is considering including it in their default python distro.

0 Kudos
JamesCrandall
MVP Frequent Contributor

I can't seem to locate the exact thread I participated in but it was about attempting to setup a Make Query Table with multiple fields and include a GROUP BY statement in the SQL.  It wouldn't work from what I can remember, or it would execute but it doesn't do the grouping as expected.

Perhaps I am misunderstanding the QueryTable!

0 Kudos
JamesCrandall
MVP Frequent Contributor

Curtis,

I am using Pandas quite extensively and I can confirm it offers a lot of value for table/list operations!  However, in the context of this thread, it just doesn't make sense because the idea of issuing SQL against the database is to return only those rose that meet the query.  That is, I wouldn't want to return 1 million rows just to process them with Pandas functions.

RobertBurke
Esri Contributor

Here is a simple sql clause that I use to get an ordered unique list of values from the Speed field in the SpeedLimits table. Distinct only returns unique values, Order by with ASC for ascending, sorts the rows.  Note my Speed field is a test type and not numeric.

uniqueList = []

with arcpy.da.SearchCursor("SpeedLimits", ["Speed"], sql_clause=("DISTINCT", 'ORDER BY Speed ASC') ) as rows:

     for row in rows:

          uniqueList.append(row[0])

stringFilter.list = uniqueList

0 Kudos