SQL in python

forestknutsen1 · ‎04-14-2015

The situation:

I have a table that is in a oracle database that I need to query and then spatialize. I need to select the most recant updated record for each group. I have a last update column to base the max date on. Then I have a name and a sand type columns that should make up the groups. So after some looking I think this is the sql that I need:

SELECT NAME, SAND, MAX(LAST_UPDATE) AS "DATE" from MY_TABLE

GROUP BY NAME, SAND

I have run it in Toad and it produced the desired result.

After the query I will need to do a join to a table that has the lat and long data and then make a xy events layer and then export the thing to a feature class in our SDE.

The question:

What it the best way to do this query in python in such a way that I can pass the output to arcpy for the join etc.

Thanks for the help,

Forest

JamesCrandall · ‎04-14-2015

SQL thru ESRI (ie, definition queries, attribute selection, etc...) aren't "REAL" sql in ANSI/ISO standard as your statement run in TOAD is. It's more like the WHERE clause portion I would think. So, your group by would likely fail.

To answer your question, what you can do is install cx_Oracle python library and execute your REAL sql statements against your non-spatial table(s), generate a cursor and then go from there. To just join the results to your spatial data you could just convert the cursor to a NumPy array and then issue arcpy.da.NumPyArrayToTable in order to get it back into ESRI world. Unfortunately, it wouldn't pull over any date fields and you'd have to find an alternative to that issue.

ArcGIS Help 10.1

Well that's how we are integrating non-spatial RDBMS tables anyway.

View solution in original post

JamesCrandall · ‎04-14-2015

SQL thru ESRI (ie, definition queries, attribute selection, etc...) aren't "REAL" sql in ANSI/ISO standard as your statement run in TOAD is. It's more like the WHERE clause portion I would think. So, your group by would likely fail.

To answer your question, what you can do is install cx_Oracle python library and execute your REAL sql statements against your non-spatial table(s), generate a cursor and then go from there. To just join the results to your spatial data you could just convert the cursor to a NumPy array and then issue arcpy.da.NumPyArrayToTable in order to get it back into ESRI world. Unfortunately, it wouldn't pull over any date fields and you'd have to find an alternative to that issue.

ArcGIS Help 10.1

Well that's how we are integrating non-spatial RDBMS tables anyway.

forestknutsen1 · ‎04-14-2015

Thanks for the info James.

Why would the date not make it? Is that because the numpy does not support that data type or something?

JamesCrandall · ‎04-14-2015

ArcGIS Help 10.1

Other field types not listed above, including date, raster, and BLOB fields are not supported.

BlakeTerhune · ‎04-14-2015

We use cx_Oracle as well. Here's a sample snippet that I use.

import cx_Oracle

try:
    oracle_db = u'UserNameHere/PasswordHere@DatabaseNameHere'
    sql = "SELECT * FROM TABLE_NAME"
    cnxn = cx_Oracle.connect(oracle_db)
    sqlcursor = cnxn.cursor()
    sqlcursor.execute(sql)
    sqlresult = sqlcursor.fetchall()
    # Validate and process result
except Exception as err:
    print err
finally:
    sqlresult = None
    if 'sqlcursor' in locals():
        sqlcursor.close()
        del sqlcursor
    if 'cnxn' in locals():
        cnxn.close()
        del cnxn

The result table can be returned in different ways, so you should validate it before you try to process it.

Valid SQL query but no rows returned: empty list
- ```
[]
```
Exactly one row with only one field value returned: single item tuple in a list
- ```
[(val,)]
```
More than one row returned with more than one field: list of tuples, each row being a tuple
- ```
[(val, val), (val, val)]
```

The third scenario is the most common, and what you should expect with the SQL query you posted. The number of tuples is the number of rows. The number of items in each tuple is the number of fields.

forestknutsen1 · ‎04-14-2015

James and Blake thanks for the help. I am able to get the data out of oracle with cx_Oracle without any problems. I did run into a little bit of a hard time getting the sql output to a numpy array. James I would be interested in seeing a code snippet of your work flow.

JamesCrandall · ‎04-15-2015

I have to pull from an existing implementation and reformat the text a bit, so not sure if this is entirely correct but it should give you the general idea of moving around. We use the pandas library for a lot of our number crunching, so that's why you see going from a list to a pandas DataFrame. From there it goes into a numpy array and specifying dtypes.

After that, pretty easy to get it to a gdb table:

dsn = cx_Oracle.makedsn(<connection params here>)
oradb = cx_Oracle.connect("unm", "pwd", dsn)
cursor = oradb.cursor()
sqlQry = "SELECT col1, col2, col3 from blah"
cursor.execute(sqlQry)
datArray = []
  
cxRows = cursor.fetchall()
for cxRow in cxRows:
   datArray.append(cxRow)
   
#close the conn to ora
cursor.close() 
oradb.close()
del cxRows, cursor
   
it = len(datArray)
if it>0: 
   #convert to pandas data frame
   DF = DataFrame(datArray, columns=['col1', 'col2', 'col3'])
   
   #...running a bunch of other data processing on this df
    
   ##convert final pandas DataFrame result into a numpyarray
   nmpyar = np.array(DF, np.dtype([('col1', '|S25'), ('col2', '|S25'), ('col3', '<f8')]))
   ##now covert the numpyarray to the gdb table in the default.gdb  
   arcpy.da.NumPyArrayToTable(nmpyar, r"H:\Default.gdb\numpytab")

forestknutsen1 · ‎04-15-2015

Thanks for posting that

JoshuaBixby · ‎04-15-2015

Have you read the Executing SQL using an ArcSDE connection documentation? I have used the ArcSDESQLExecute object in the past to accomplish similar tasks as what you are trying here.

JamesCrandall · ‎04-15-2015

I took the OP to mean that this was a non-spatial Oracle db/table(s) that needed to be joined back to features. There would be no .sde file to use for making a connection as the db would not be a registered SDE database.

But I'm with you in that keeping in the ESRI stack when possible is best since you have to manage all of these other libraries. But if arcpy is insufficient, something has to be done.