Need help to optimize my code

SteveCole · ‎12-10-2012

I have a python script which runs fine but takes far too long to run so I'm hoping some of the more experienced python coders can offer suggestions about where I can be more efficient. The script I've written will hopefully be run as a windows task at a given time interval to constantly update a table in SDE which will be consumed by a web map.

As I said, the table is in SDE but the source data is a table in an Access database. Due to a daily database maintenance period, the Access database cannot be touched unless the presence/absence of two files is true (hence the conditional statement about the files). If it's ok to proceed, here's what happens:

Delete all records from the SDE table

Connect to the Access database

Step through the records in an Access Query and transfer that information into the SDE table

It took roughly 20 minutes to transfer 2200 records. That seems WAY too slow so any suggestions would be greatly appreciated! Here's the code:

import sys import os import pyodbc import arcpy from datetime import datetime from arcpy import env  file01 = r"\\Path\to\File01" #This file must exist file02 = r"\\Path\to\File02" #This file must NOT exist expression = '1=1' #SQL shorthand which select all records theTable = "tblGageData"  # The tables within DIADvisor must not be accessed during its daily database maintenance. # OneRain recommends checking for the existence and non-existence of two specific files. # If both conditions are true, it is safe to proceed with accessing the data within the # dvLive Access database if os.path.exists(file01) and not os.path.exists(file02):         print "Processing start time: " + str(datetime.now())                  env.workspace = r"Database Connections\SdeConnection.sde"         try:                 # Set some local variables                 tempTableView = "gageTableView"                  # Execute MakeTableView                 arcpy.MakeTableView_management(theTable, tempTableView)                  # Execute SelectLayerByAttribute to select all records                 arcpy.SelectLayerByAttribute_management(tempTableView, "NEW_SELECTION", expression)                  # Execute GetCount and if some records have been selected, then execute                 #  DeleteRows to delete the selected records.                 if int(arcpy.GetCount_management(tempTableView).getOutput(0)) > 0:                         arcpy.DeleteRows_management(tempTableView)                                  # Now connect to the DIADvisor access database and import the most recent data                 # This requires the PYODBC module for Python                 cstring = r'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=Path\to\Access\Database\Database.mdb;Provider=MSDASQL;'                 conn = pyodbc.connect(cstring)                 cursor = conn.cursor()                                  sqlString="SELECT * FROM LAST3DAYS"                 counter = 0                 # Loop through the results returned via the ODBC connection                 for cRow in cursor.execute(sqlString):                     curSensorId = cRow.SENSOR_ID                     curEpoch = cRow.EPOCH                     curData = cRow.DATA                                counter += 1                     #Insert a new row into the SDE table with the current DIADvisor record's information                     curSde = arcpy.InsertCursor(r"Database Connections\SdeConnection.sde\tblGageData")                      row = curSde.newRow()                     row.SENSOR_ID = curSensorId                     row.EPOCH = curEpoch                     row.DATA = curData                     curSde.insertRow(row)                                  # Close the ODBC connection and perform some variable cleanup                 cursor.close()                 conn.close()                 del row                 del conn                 del curSde                 del cRow                 del cursor                 print "Number of record(s) in the DIADvisor database: " + str(counter)                 print "Processing end time: " + str(datetime.now())         except Exception as e:                 # If an error occurred, print line number and error message                 import traceback, sys                 cursor.close()                 conn.close()                 tb = sys.exc_info()[2]                 print "Line %i" % tb.tb_lineno                 print str(e)  else:         sys.exit()

T__WayneWhitley · ‎12-10-2012

Looks to me that the bottleneck is occurring when establishing the insert cursor inside the for loop...not necessary and undoubtedly slowing things down. Simply move it outside the loop...

...as in this corrected excerpt of your code:

# Establish an insert cursor, then start looping... # Loop through the results returned via the ODBC connection curSde = arcpy.InsertCursor(r"Database Connections\SdeConnection.sde\tblGageData") for cRow in cursor.execute(sqlString):      curSensorId = cRow.SENSOR_ID      curEpoch = cRow.EPOCH      curData = cRow.DATA                 counter += 1       #Insert a new row into the SDE table with the current DIADvisor record's information      row = curSde.newRow()      row.SENSOR_ID = curSensorId      row.EPOCH = curEpoch      row.DATA = curData      curSde.insertRow(row)

View solution in original post

Anonymous User · ‎12-10-2012

Why not just use the TableToTable_conversion tool? You could simply overwrite that table in the new SDE location. That would be much more efficient that using an insert cursor write those rows.

http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//001200000027000000

T__WayneWhitley · ‎12-10-2012

Looks to me that the bottleneck is occurring when establishing the insert cursor inside the for loop...not necessary and undoubtedly slowing things down. Simply move it outside the loop...

...as in this corrected excerpt of your code:

# Establish an insert cursor, then start looping... # Loop through the results returned via the ODBC connection curSde = arcpy.InsertCursor(r"Database Connections\SdeConnection.sde\tblGageData") for cRow in cursor.execute(sqlString):      curSensorId = cRow.SENSOR_ID      curEpoch = cRow.EPOCH      curData = cRow.DATA                 counter += 1       #Insert a new row into the SDE table with the current DIADvisor record's information      row = curSde.newRow()      row.SENSOR_ID = curSensorId      row.EPOCH = curEpoch      row.DATA = curData      curSde.insertRow(row)

SteveCole · ‎12-10-2012

BINGO!

Moving the InsertCursor outside of the loop was the issue. After doing that, the same process took 23 seconds. *MUCH* better!

Since I'm new to Python, I didn't know about Caleb's suggestion of TableToTable. I am interested in that since that would ideally be even cleaner than what I currently have. The Help docs don't have an example of using an ODBC as source. Can anyone provide an example of that? I'm not sure how to create a tableview on the ODBC connection.

T__WayneWhitley · ‎12-10-2012

Maybe, not sure, are you okay with overwriting the entire table? Anyway you could even test 2 versions of code - 20 seconds sounds good, but here's a way to possibly even speed it up more accessing Access this way (which may not be significantly different, but worth a look at) - you may not need the special module to access your table:

Formatting a table in Microsoft Access for use in ArcGIS
Geodata » Data types » Tables » Creating and editing tables
http://resources.arcgis.com/en/help/main/10.1/index.html#//005s00000022000000

Then, of course, if it works for you, once you are able to 'view' the table within ArcGIS, I believe the Table to Table tool will also see it.

Let us know how that works for you - maybe Caleb has a further comment, since it was his idea?

EDIT:
oops, good link above - however you'll probably need this one on OLE DB:

http://resources.arcgis.com/en/help/main/10.1/index.html#/Connecting_to_a_Microsoft_Access_database_...
Connecting to a Microsoft Access database in ArcGIS
Geodata » Data types » Tables » Creating and editing tables

...or this one for 2007 Access (accdb):

Connecting to a 2007 Microsoft Access database (.accdb) in ArcGIS
Geodata » Data types » Tables » Creating and editing tables
http://resources.arcgis.com/en/help/main/10.1/index.html#/Connecting_to_a_2007_Microsoft_Access_data...

Anonymous User · ‎12-10-2012

Since I'm new to Python, I didn't know about Caleb's suggestion of TableToTable. I am interested in that since that would ideally be even cleaner than what I currently have. The Help docs don't have an example of using an ODBC as source. Can anyone provide an example of that? I'm not sure how to create a tableview on the ODBC connection.

I may have bit off more than I can chew with that suggestion. I haven't seen any good examples of making a table view from the ODBC source either. If you find a way to do that, the TableToTable would work well. But as Wayne suggested, it seems like moving the initiation of the insert cursor outside of the loop has got you back in business for the time being, and 23 seconds isn't bad!

I did find this on stack exchange though, looks like it may be possible using the CreateDatabaseConnection if you are on 10.1.

http://gis.stackexchange.com/questions/29676/create-an-arccatalog-ole-db-connection-using-python

Hope this helps.

SteveCole · ‎12-11-2012

Thanks, guys, for the follow up suggestions. Since I'm dealing with the latest stream gage data, overwriting the contents of my SDE table is desirable. We're not running 10.1 just yet so I think I'm going to run with my updated script for now and revisit the TableToTable option in the future.

Thanks again!
Steve

T__WayneWhitley · ‎12-11-2012

Here's the thing - you should be able to create an odc file which you can store at the customary app data install directory (for ArcCatalog), which is the default location for the shortcut connection files that are created when using 'Add Spatial Database Connection' or 'Add OLE DB Connection'.... and this saved odc file is 'encrypted' (I use this term loosely) as are the saved sde files you have likely made. You aren't required to keep them in the app dir, if you feel you need to move them to a more secure location, go ahead....just make them readable by script so at runtime your script can 'grab' the params from the connect file. I have 10, so I don't have the ready option in the Workspace toolset to execute an arcpy command to create a 'dynamic' connection - but is this really necessary? I guess it could be if you wanted to furnish a password at time of execution...also, keep in mind you can leverage the option, for example in your sde connection, to use either Database or Operating System user authentication (depending on your underlying database).

Anyway, just as you can set the workspace in arcpy to an sde connect filepath, so should you be able to do the same with your odc connect filepath - haven't tested this yet, but arcpy should be able to 'handle things'. I have Office 2007 installed, so here's what I did just to update you on what elementary test I ran this morning:

1- Opened a new Database Connections >> Add OLE DB Connection
2- ...since I have 2007, on the Provider tab, selected:

Microsoft Office 12.0 Access Database Engine OLE DB Provider

3- Hit 'Next >>', which goes to the Connection tab... entered the path to my test Access db (.accdb)
(This is step 1 on the Connection tab, the entire pathname string entered in the Data Source text box.)

4- Hit 'Test Connection' and got the 'succeeded' msg....so I'm ready to test using my new connection!!
(hit OK, dismiss the Data Link Properties window)

5- I used Copy Rows (but believe you could just as well use Table to Table) to ram the table into a stage.gdb file geodb.

6- Examined the gdb, tried it a couple more times to test the overwrite --- nothing crashed so now I think I'm satisfied, suggest you try it out...

Good luck - report back with any success/failure experience. (Hope for success!)

-Wayne

T__WayneWhitley · ‎12-11-2012

I thought this was interesting, so thought I'd test further - OLE DB connection Table to Table, destination is a file gdb, successful test below....I know from here that 'pumping' into SDE is possible with arcpy, either directly from the 'raw' OLE DB table, or from the stage copy (preferably the source, I usually only use a staging gdb under dubious network connection conditions, lol, or for further processing before posting to SDE).

[EDIT: An important nice feature of the processing is that an OBJECTID field gets auto-added to the output.]

The below shows successful arcpy connection via the default location odc file set up from ArcCatalog as posted previously. Ran a rudimentary test via cursor access then Table to Table, two or three executions and consequent examination in ArcCatalog to make sure the overwrite occurred --- I locked the db accidentally by viewing the connection (not shown below) and so the overwrite was naturally prevented so it may be better to, say, do something like testschemalock (or put it in a 'try-except') to make sure it doesn't bail on a lock. (since this was line execution in IDLE, didn't think it warranted CODE tags, hope you can read okay):

>>> connectOLEDB = r'Database Connections\test OLE DB connect 2007 Access.odc'

>>> import arcpy
>>> arcpy.env.workspace = connectOLEDB
>>> rows = arcpy.SearchCursor('parcels_removed_from_injunction')
>>> row = rows.next()
>>> print row.RemovedDate
2005-12-16 00:00:00
>>>
>>> del row, rows
>>>
>>> # TableToTable_conversion (in_rows, out_path, out_name, {where_clause}, {field_mapping}, {config_keyword})
>>>
>>> in_rows = 'parcels_removed_from_injunction'
>>> out_path = r'C:\Documents and Settings\whitley-wayne\Desktop\stage.gdb'
>>> out_name = 'testGDBfromAccess'
>>> arcpy.TableToTable_conversion(in_rows, out_path, out_name)
<Result 'C:\\Documents and Settings\\whitley-wayne\\Desktop\\stage.gdb\\testGDBfromAccess'>
>>>
>>> arcpy.env.overwriteOutput = True
>>> arcpy.TableToTable_conversion(in_rows, out_path, out_name)
<Result 'C:\\Documents and Settings\\whitley-wayne\\Desktop\\stage.gdb\\testGDBfromAccess'>
>>> arcpy.TableToTable_conversion(in_rows, out_path, out_name)
<Result 'C:\\Documents and Settings\\whitley-wayne\\Desktop\\stage.gdb\\testGDBfromAccess'>
>>>

SteveCole · ‎12-11-2012

Wow, thanks for fleshing all that out, Wayne!

Your technique definitely reduces the number of lines of code. I played with the process a little bit but stopped because the TableToTable command did not overwrite the SDE table (the destination table). This was due to the fact that I had set the workspace to the OLE DB just like in your posted example. I got to thinking about that and decided that for this specific situation, I think I'm better served by my original solution because under that scenario, my SDE table will ALWAYS exist. This is key because the table is being consumed by a map service.

Although the chances are very remote, it's still a possibility that someone could load the map service at the moment that the python update sequence is running, which would cause the web map to fail during loading or refresh. I did embrace the OLE DB part of your example which frees me from needing an install of PYODBC. I think it even was just a couple seconds faster, too.

Thanks, again!
Steve