arcpy script using sqlalchemy and pandas only runs once

4830
25
08-08-2016 06:27 AM
HendrikBernert1
New Contributor II

I am struggling with the following issue:

I created a arcpy script in which I connect to a database via sqlalchemy and collect some data into a pandas data frame. When I execute the tool within ArcGIS 10.3., it works perfectly fine. However, when I try to run it a second time within the same ArcGIS-session, it does not proceed, does not react to cancelling and seems to do nothing at all.

Here is what I basically do:

from sqlalchemy import create_engine
import pandas

# set up sqlalchemy engine:
engine = create_engine('mssql+pyodbc://[USER]:[PASSWORD]@[SERVER]/[DATABASE]?driver=SQL+Server')
# design query
query = "some SELECT query here"
# read data to pandas data frame
df = pandas.read_sql_query(query, engine)

As I said, for the first run, everything is fine. When I execute it again, nothing happens, not even an error message. Another strange thing: it looks like it works fine when I execute the code from my IDE's python console, I can run it as often as I like. Seems like the problem is either in the engine setup or within ArcGIS.

Any help would be great, thank you!

EDIT: When I modify the script to force an error, fix this error and run it again, I get the following error message:

...
File "C:\Python27\lib\site-packages\sqlalchemy\sql\selectable.py", line 2469, in __init__
for c in columns:
AttributeError: 'tuple' object has no attribute 'lowercase'
0 Kudos
25 Replies
HendrikBernert1
New Contributor II

I think we're getting a little bit off-track here... using numpy instead of pandas is another topic, on which I will check out your blog to find out whether I switch to numpy or not. But for my initial issue, the problem occurs within the function

df = pandas.read_sql_query(query, engine)

where my engine setup looks like

from sqlalchemy import create_engine
engine = create_engine('mssql+pyodbc://[USER]:[PASSWORD]@[SERVER]/[DATABASE]?driver=SQL+Server')

So the basic problem is: why does the tool work fine at the first run, but does nothing on the second run? And why does pandas create a tuple-cursor-object (this is what I think is happening)?

@Dan: I will have a look at your blog and see if  numpy is sufficient for my purposes. I'm basically doing a simple SELECT-query (not SELECT INTO), so I don't think the problem is within the query itself, but in the way pandas deals with connections and cursors

0 Kudos
JamesCrandall
MVP Frequent Contributor

So the basic problem is: why does the tool work fine at the first run, but does nothing on the second run?

Have you tried closing your connections to the sql db?

0 Kudos
RahelKreutz
New Contributor

I'm doing this to get rid of the connection (I'm using an engine, not the connection itself):

engine.dispose()
del engine

From my understanding, this should do the job (an engine has no particuclar method .close()).

0 Kudos
JamesCrandall
MVP Frequent Contributor

I think you are correct.  I did see reference to the notion that it's actually a pool of connections and you could issue a .close() to each/all.  But also read that your .dispose() should be handling the disconnect.  As a test I'd run a simple SELECT statement to see if you are getting the same strange result of only working on the first pass.  If it works as expected, then perhaps the issue is at the db-level.

0 Kudos
JamesCrandall
MVP Frequent Contributor

Also, are you deleting the pandas DataFrame?

0 Kudos
HendrikBernert1
New Contributor II

Yes, I do delete everything I assign (at least I hope I did not oversee something). I'll create a simpler SELECT testcase and get back to you as soon as I can

0 Kudos
DanPatterson_Retired
MVP Emeritus

and there are no lock files being created by arcmap then? (confirm through windows explorer) If arcmap is open are you using a separate IDE or arcmap's?  If you are always using a separate python ide and the only difference is whether arcmap is open, then the culprit is arcmap and tracking that down needs viewing what is going on with the operating system.

0 Kudos
HendrikBernert1
New Contributor II

As far as I can see, there are no lock files created. I'm using an external IDE, but it does not matter whether I have it open or not.

Little step forward: now it runs twice and crashes when executed a third time... extremely strange

0 Kudos
HendrikBernert1
New Contributor II

Ok, I created a small test-case with only the very basic function and executed it in twi different ways:

1. create a Toolbox (without any parameters set) and run it from there

2. run each single line in the ArcGIS Python Console

Results:

1. The same problem occurs as described at the top of this thread. Only difference: the script runs twice and not once, but the third run simply hangs up.

2. perfectly fine, I can do whatever I want as often as I want.

So the problem seems to come from the Toolbox and is not within the query or the database.

Here is my Toolbox-setup:

- Store relative path names

- Always run in foreground

- Run Python script in process

- Parameters: all given as string with one exception (folder), all read with arcpy.GetParameterAsText()

So nothing unusual here...

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I was able to reproduce the issue with your attached code.  I figured out why I didn't generate the errors yesterday.  Oddly enough, if I run the code once in the Interactive Python window in ArcMap, then I can run the script tool numerous times without errors.  If I run the script tool out of process, it runs fine also.  If I run it as a tool in a Python toolbox, instead of a script in a regular toolbox, it runs fine.  I tried cutting out sqlalchemy and pandas and using pyodbc only, same results.

Although I can't explain what or why, everything points to a problem with in-process, regular Toolboxes.  It might be worth trying running out of process or converting to a Python Toolbox.