arcpy script using sqlalchemy and pandas only runs once

HendrikBernert1 · ‎08-08-2016

I am struggling with the following issue:

I created a arcpy script in which I connect to a database via sqlalchemy and collect some data into a pandas data frame. When I execute the tool within ArcGIS 10.3., it works perfectly fine. However, when I try to run it a second time within the same ArcGIS-session, it does not proceed, does not react to cancelling and seems to do nothing at all.

Here is what I basically do:

from sqlalchemy import create_engine
import pandas

# set up sqlalchemy engine:
engine = create_engine('mssql+pyodbc://[USER]:[PASSWORD]@[SERVER]/[DATABASE]?driver=SQL+Server')
# design query
query = "some SELECT query here"
# read data to pandas data frame
df = pandas.read_sql_query(query, engine)

As I said, for the first run, everything is fine. When I execute it again, nothing happens, not even an error message. Another strange thing: it looks like it works fine when I execute the code from my IDE's python console, I can run it as often as I like. Seems like the problem is either in the engine setup or within ArcGIS.

Any help would be great, thank you!

EDIT: When I modify the script to force an error, fix this error and run it again, I get the following error message:

...
File "C:\Python27\lib\site-packages\sqlalchemy\sql\selectable.py", line 2469, in __init__
for c in columns:
AttributeError: 'tuple' object has no attribute 'lowercase'

JoshuaBixby · ‎08-08-2016

"When I execute the tool within ArcGIS 10.3..." It seems like your running the code from within an Toolbox? If so, a regular Toolbox or Python Toolbox? What if you run the code from the interactive Python window, what happens?

HendrikBernert1 · ‎08-08-2016

Indeed, I run the script as Toolbox with some parameters to be set by the user.

When I execute the basic steps in the interactive Python console, it works fine, as it does within my IDE. Is there any difference between the script from a toolbox?

DanPatterson_Retired · ‎08-08-2016

what parameters are set and what parameters are not set within the tool?

Also the lowercase issue... lowercase is not a python method... lower is

>>> a = "UPPER"

>>> a.lower()

'upper'

RahelKreutz · ‎08-08-2016

The parameters set are the server and database from which the data shall be read, paths and some text for the map title as well as the exporting format. Nothing special there.

I honestly have no idea what the 'lowercase' issue means, it occurs somewhere whithin the pandas module when executing the SELECT-query. To me, it seems like the pandas module opens the connection, defines a cursor object and then goes through the columns provided by the SELECT-query.

For the first time, everything is perfectly fine. When I run it a second time, it seems like the cursor-object is not overwritten, but appended to the first one, which causes the object to be a tuple. I don't know I'm right here, and the worst part: I have no clue how to get rid of that problem.

DanPatterson_Retired · ‎08-08-2016

And you have added a simple 'del' to the scripts to delete the objects prior to completion? I would assume that would be automatic on script completion, but I use numpy rather than pandas, so I can't comment on how it handles objects during script completion... some of it is a black box, but you would have to go through the github code to see what is going on with any particular function/module pandas/pandas at master · pydata/pandas · GitHub

HendrikBernert1 · ‎08-08-2016

Yes, I delete all objects that matter in that case, specifically the data frame after exporting the data and the engine after the data is retrieved. From my understanding, pandas as well as sqlalchemy are taking care of committing, rollback, close and everything else, but obviously, I missed something at some point.

But since you mention numpy: what I basically do is, I select data into the pandas data frame and do everything else there, which means I group the data based on a columnn specified by the user and export the resulting table as a csv-file. Can you do the same thing with numpy? I specifically worry about the grouping...

JoshuaBixby · ‎08-09-2016

What are your script properties? For example, with the General tab, are "Store relative path names..." and "Always run in foreground" checked? On the Source tab, is "Run Python script in process" checked?

In ArcMap's Geoprocessing Options dialog, is "Background Processing" enabled? Have you installed 64-bit Background Geoprocessing?

I ginned up a little script modeled after yours. Since I don't know what your code does once it generates the Panda dataframe, I just dump some of the dataframe content as a message using arcpy.AddMessage() . No matter how often I run it, I get the results I expect dumped as a message.

Without knowing a bit more of the script settings and what is being done with the Panda dataframe, I am all out of ideas.

HendrikBernert1 · ‎08-09-2016

Yes, those options are checked within the toolbox settings as well as in the Geoprocessing Options. With simple print-debugging (print some message after each step), I came to the conclusion that the error occurs within the aforementioned function

df = pandas.read_sql_query(query, engine)

Deleting the pandas data frame as well as the engine which connects to the database don't do the job, so maybe you're right and the error occurs within the db. But if so, why does it work from the console as often as I like?

Maybe one more piece of information (since this is an internal company project, unfortunately I can't be too specific here): I run the database connection and querying from a function which I will attach here. As you can see there, I'm also trying to check whether the engine already exists, but the result is still the same. I'm out of ideas...

DanPatterson_Retired · ‎08-08-2016

by group, do you basically mean that you basically do an argsort, then export to csv or are you doing a query, then exporting to a csv? Yes you can. It would help if you just posted a snip of the attributes table and what you are querying for... maybe even whip in a pandas method or two so I can see what you are doing and why you are using the sqlalchemy. I have written extensively on using numpy in my blog and elsewhere so it would be easier to work from example data to narrow down my search.