Need help to optimize my code

SteveCole · ‎12-10-2012

I have a python script which runs fine but takes far too long to run so I'm hoping some of the more experienced python coders can offer suggestions about where I can be more efficient. The script I've written will hopefully be run as a windows task at a given time interval to constantly update a table in SDE which will be consumed by a web map.

As I said, the table is in SDE but the source data is a table in an Access database. Due to a daily database maintenance period, the Access database cannot be touched unless the presence/absence of two files is true (hence the conditional statement about the files). If it's ok to proceed, here's what happens:

Delete all records from the SDE table

Connect to the Access database

Step through the records in an Access Query and transfer that information into the SDE table

It took roughly 20 minutes to transfer 2200 records. That seems WAY too slow so any suggestions would be greatly appreciated! Here's the code:

import sys import os import pyodbc import arcpy from datetime import datetime from arcpy import env  file01 = r"\\Path\to\File01" #This file must exist file02 = r"\\Path\to\File02" #This file must NOT exist expression = '1=1' #SQL shorthand which select all records theTable = "tblGageData"  # The tables within DIADvisor must not be accessed during its daily database maintenance. # OneRain recommends checking for the existence and non-existence of two specific files. # If both conditions are true, it is safe to proceed with accessing the data within the # dvLive Access database if os.path.exists(file01) and not os.path.exists(file02):         print "Processing start time: " + str(datetime.now())                  env.workspace = r"Database Connections\SdeConnection.sde"         try:                 # Set some local variables                 tempTableView = "gageTableView"                  # Execute MakeTableView                 arcpy.MakeTableView_management(theTable, tempTableView)                  # Execute SelectLayerByAttribute to select all records                 arcpy.SelectLayerByAttribute_management(tempTableView, "NEW_SELECTION", expression)                  # Execute GetCount and if some records have been selected, then execute                 #  DeleteRows to delete the selected records.                 if int(arcpy.GetCount_management(tempTableView).getOutput(0)) > 0:                         arcpy.DeleteRows_management(tempTableView)                                  # Now connect to the DIADvisor access database and import the most recent data                 # This requires the PYODBC module for Python                 cstring = r'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=Path\to\Access\Database\Database.mdb;Provider=MSDASQL;'                 conn = pyodbc.connect(cstring)                 cursor = conn.cursor()                                  sqlString="SELECT * FROM LAST3DAYS"                 counter = 0                 # Loop through the results returned via the ODBC connection                 for cRow in cursor.execute(sqlString):                     curSensorId = cRow.SENSOR_ID                     curEpoch = cRow.EPOCH                     curData = cRow.DATA                                counter += 1                     #Insert a new row into the SDE table with the current DIADvisor record's information                     curSde = arcpy.InsertCursor(r"Database Connections\SdeConnection.sde\tblGageData")                      row = curSde.newRow()                     row.SENSOR_ID = curSensorId                     row.EPOCH = curEpoch                     row.DATA = curData                     curSde.insertRow(row)                                  # Close the ODBC connection and perform some variable cleanup                 cursor.close()                 conn.close()                 del row                 del conn                 del curSde                 del cRow                 del cursor                 print "Number of record(s) in the DIADvisor database: " + str(counter)                 print "Processing end time: " + str(datetime.now())         except Exception as e:                 # If an error occurred, print line number and error message                 import traceback, sys                 cursor.close()                 conn.close()                 tb = sys.exc_info()[2]                 print "Line %i" % tb.tb_lineno                 print str(e)  else:         sys.exit()

T__WayneWhitley · ‎12-11-2012

That's great, glad you have a working solution. I just have one more idea you can try based on what you said about maintaining the table, essentially 'refilling' it without deleting it. I see you are in effect building a row object with a cursor which then gets loaded one iteration of the loop at a time.

Consider this (and I am not saying I know for certain there is a significant difference):
What if you read the information you needed from the Access database, processed the rows completely, deleted rows (as you have already going), then used Append to load the rows that have already been processed. If there is any 'lag time' in processing each row (I assume their is a cummulative cost for that) you can 'soak up' that processing time in what I call a 'staging geodatabase', which can be a temporary 'dummy' SDE or file geodatabase table, (or possibly an in-memory workspace table?), etc. There is still a cost, of course, processing the delete rows and append, but maybe not as much?

An interesting situation to test, I think. Also, if Append (Data Management >> General >> Append) is not satisfactory, then what about the insert cursor operation you are using now with 1 major exception, i.e., the handling of the row object. Isn't the loading of preprocessed rows from a staging gdb cutting out time 'in-transit' to fetch field values in the OLE DB read and consequent destination field assignment that must take place before the row object load into the table happens?

Hope that helps,
Wayne

EDIT: I just remembered something (if memory serves correctly) - I have set up a scheduled task (nice how you can forget about them, once they're going!) where I was updating a layer by a similar type of process you are trying to implement here, and I remember having problems with orphaned SDE connections to the destination layer when I tried the 'remove and replace' means that Table To Table would do here. The connections would persist in the middle of the night and lock the feature class, a complete impasse (however, I think with 10.1 you now have the ability to target and disconnect users). I don't think Append requires a schema lock...and also does not require reestablishing privileges. (lol, I did overrun some database log files though, but the workaround was for the DB Admin to increase the max limit on that type of logging and clean out the log more frequently.) Append worked nicely and I think the insert cursor would have worked also (I think either operation will require an exclusive schema lock.)

Anonymous User · ‎12-11-2012

Very cool Wayne!!! You are a GIS/arcpy jedi!

T__WayneWhitley · ‎12-11-2012

Thank you, Caleb, but I am suspicious of my last statement - still puzzling over it a bit:
"I think either operation [Append or insert cursor] will require an exclusive schema lock."

I found this:

Accessing data using cursors (the section on Cursors and locking)
Resource Center » Professional Library » Geoprocessing » Geoprocessing with Python » Accessing geographic data in Python
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//002z0000001q000000

"Update and insert cursors cannot be created for a table or feature class if an exclusive lock exists for that dataset. The UpdateCursor or InsertCursor functions fail because of an exclusive lock on the dataset. If these functions successfully create a cursor, they apply an exclusive lock on the dataset so that two scripts cannot create an update or insert cursor on the same dataset."

So, in short, update and insert cursors will create an exclusive lock, but cannot apply one if one already exists - neither can one be applied if there is a shared lock (e.g., as produced by viewing the subject dataset in ArcCatalog). Multiple shared locks are allowed; only one exclusive lock is allowed but only after all shared locks are released.

I'm even more curious now about the Append tool - I'll check it again when I get back to the office --- simple test, if I can view a table while I am appending to it, and refresh to see the appended rows, then an exclusive lock is not required.

EDIT: Forgot to add this, concerning Server services, in order to disable schema locking, the following tag can be added to the respective service config file: <SchemaLockingEnabled>false</SchemaLockingEnabled>
There are associated security risks in doing this, and I have yet to test this as thoroughly as I would like.