The situation:
I have a table that is in a oracle database that I need to query and then spatialize. I need to select the most recant updated record for each group. I have a last update column to base the max date on. Then I have a name and a sand type columns that should make up the groups. So after some looking I think this is the sql that I need:
SELECT NAME, SAND, MAX(LAST_UPDATE) AS "DATE" from MY_TABLE
GROUP BY NAME, SAND
I have run it in Toad and it produced the desired result.
After the query I will need to do a join to a table that has the lat and long data and then make a xy events layer and then export the thing to a feature class in our SDE.
The question:
What it the best way to do this query in python in such a way that I can pass the output to arcpy for the join etc.
Thanks for the help,
Forest
Solved! Go to Solution.
SQL thru ESRI (ie, definition queries, attribute selection, etc...) aren't "REAL" sql in ANSI/ISO standard as your statement run in TOAD is. It's more like the WHERE clause portion I would think. So, your group by would likely fail.
To answer your question, what you can do is install cx_Oracle python library and execute your REAL sql statements against your non-spatial table(s), generate a cursor and then go from there. To just join the results to your spatial data you could just convert the cursor to a NumPy array and then issue arcpy.da.NumPyArrayToTable in order to get it back into ESRI world. Unfortunately, it wouldn't pull over any date fields and you'd have to find an alternative to that issue.
Well that's how we are integrating non-spatial RDBMS tables anyway.
SQL thru ESRI (ie, definition queries, attribute selection, etc...) aren't "REAL" sql in ANSI/ISO standard as your statement run in TOAD is. It's more like the WHERE clause portion I would think. So, your group by would likely fail.
To answer your question, what you can do is install cx_Oracle python library and execute your REAL sql statements against your non-spatial table(s), generate a cursor and then go from there. To just join the results to your spatial data you could just convert the cursor to a NumPy array and then issue arcpy.da.NumPyArrayToTable in order to get it back into ESRI world. Unfortunately, it wouldn't pull over any date fields and you'd have to find an alternative to that issue.
Well that's how we are integrating non-spatial RDBMS tables anyway.
Thanks for the info James.
Why would the date not make it? Is that because the numpy does not support that data type or something?
Other field types not listed above, including date, raster, and BLOB fields are not supported.
We use cx_Oracle as well. Here's a sample snippet that I use.
import cx_Oracle try: oracle_db = u'UserNameHere/PasswordHere@DatabaseNameHere' sql = "SELECT * FROM TABLE_NAME" cnxn = cx_Oracle.connect(oracle_db) sqlcursor = cnxn.cursor() sqlcursor.execute(sql) sqlresult = sqlcursor.fetchall() # Validate and process result except Exception as err: print err finally: sqlresult = None if 'sqlcursor' in locals(): sqlcursor.close() del sqlcursor if 'cnxn' in locals(): cnxn.close() del cnxn
The result table can be returned in different ways, so you should validate it before you try to process it.
[]
[(val,)]
[(val, val), (val, val)]
The third scenario is the most common, and what you should expect with the SQL query you posted. The number of tuples is the number of rows. The number of items in each tuple is the number of fields.
James and Blake thanks for the help. I am able to get the data out of oracle with cx_Oracle without any problems. I did run into a little bit of a hard time getting the sql output to a numpy array. James I would be interested in seeing a code snippet of your work flow.
I have to pull from an existing implementation and reformat the text a bit, so not sure if this is entirely correct but it should give you the general idea of moving around. We use the pandas library for a lot of our number crunching, so that's why you see going from a list to a pandas DataFrame. From there it goes into a numpy array and specifying dtypes.
After that, pretty easy to get it to a gdb table:
dsn = cx_Oracle.makedsn(<connection params here>) oradb = cx_Oracle.connect("unm", "pwd", dsn) cursor = oradb.cursor() sqlQry = "SELECT col1, col2, col3 from blah" cursor.execute(sqlQry) datArray = [] cxRows = cursor.fetchall() for cxRow in cxRows: datArray.append(cxRow) #close the conn to ora cursor.close() oradb.close() del cxRows, cursor it = len(datArray) if it>0: #convert to pandas data frame DF = DataFrame(datArray, columns=['col1', 'col2', 'col3']) #...running a bunch of other data processing on this df ##convert final pandas DataFrame result into a numpyarray nmpyar = np.array(DF, np.dtype([('col1', '|S25'), ('col2', '|S25'), ('col3', '<f8')])) ##now covert the numpyarray to the gdb table in the default.gdb arcpy.da.NumPyArrayToTable(nmpyar, r"H:\Default.gdb\numpytab")
Thanks for posting that
Have you read the Executing SQL using an ArcSDE connection documentation? I have used the ArcSDESQLExecute object in the past to accomplish similar tasks as what you are trying here.
I took the OP to mean that this was a non-spatial Oracle db/table(s) that needed to be joined back to features. There would be no .sde file to use for making a connection as the db would not be a registered SDE database.
But I'm with you in that keeping in the ESRI stack when possible is best since you have to manage all of these other libraries. But if arcpy is insufficient, something has to be done.