Currently, I have a feature layer that I step through row by row and update each row in the layer with output data from a SQL function. This works but is slow. I have been asked to try and speed this up and the suggestion was to attempt some form of global update in one go tat could link the feature class to a function or view and update all rows in one go. The data returned from the current function relies on inputs from the current row to create the returned data. I have attached the current code below:message = "[INFO] Looping through selected RoadNetwork started at " + str(datetime.datetime.now()) print message arcpy.AddMessage(message) LogFile.write(message + "\n") # Call function to get extra columns cursor = arcpy.UpdateCursor(FinalNetworkLayerName) for featurerow in cursor: currentRow = currentRow + 1 if currentRow%100 == 0: message = "[INFO] Processed " + str(currentRow) + " of " + str(RoadNetworkCount) + " rows at " + str(datetime.datetime.now()) print message arcpy.AddMessage(message) LogFile.write(message + "\n") CurrentToid = featurerow.TOID # Establishing connection to MSSQL database sql = "SELECT * FROM [DWOP].[dbo].[ufunc_ReturnGGFData] ('" + CurrentToid + "'," + OfficeIDParm # Call function sde_return = sdeConn2.execute(sql) if isinstance(sde_return, list): for row in sde_return: featurerow.SEG_ID = row[0] featurerow.STL_NAME = row[1] featurerow.STR_NAME = row[2] featurerow.ST_CLASS = row[3] featurerow.ORIG_LVL = row[4] featurerow.DEST_LVL = row[5] featurerow.ONEWAY = row[6] featurerow.SPEED1 = row[7] featurerow.SPEED2 = row[8] featurerow.SPEED3 = row[9] featurerow.SPEED4 = row[10] featurerow.TRN_MODE = row[11] featurerow.BTH_MODE = row[12] featurerow.RESTR_ORI = row[13] featurerow.RESTR_DEST = row[14] cursor.updateRow(featurerow) del cursor, featurerow
As I stated above. this works but is quite slow (averaging over 2 seconds per row update).
It seems unlikely to me that such a global update could be done but I am new to ArcPy so may be missing something.
Cheers
Mark
If the ObjectID list is crazy huge you may need to chunk it into selection groups with the modulus operator so that the SQL where clause is not excessively large. Then query that group back to a separate layer copy of the input selection layer. The layer copy would be created outside of your loop using the Make Feature Layer operation.
This could certainly throw a wrench into the "IN" statement approach --- is there any issues with chunk processing part? One potential alternative idea is to use a Table Variable instead. If the list of OID's is really large you could first run an insert into the table variable then simply JOIN this to the table in your main query. This design approach would be best fit into a StoredProcedure rather than dynamic SQL though (not a bad plan either as the SQL statement will have already been compiled on the server).
myTable = r"C:\temp\test.gdb\test" valueDict = {r[0]:(r[1]) for r in arcpy.da.SearchCursor(myTable, ["OID@","ID"])} updateRows = arcpy.da.UpdateCursor(myTable, ["OID@","ID"]) for updateRow in updateRows: oidVal, idVal = updateRow updateRow[1] = sum([i[0] for i in valueDict if valueDict[0] >= idVal]) updateRows.updateRow(updateRow) del updateRow, updateRows
Some comments and examples:
1. Use a dictionary, as Richard said its way faster than using imbedded cursors, and your bosses will of course give you a raise, since you improved the performance so much.
2. If the dictionary gets "too big"... that is it occupies more than ~2.1 GB of RAM, which is the (sort of) limit of 32-bit Python, you can use 64-bit Python along with the 64-bit "ArcGIS_BackgroundGP_for_Desktop_101sp1.exe", which basically turns arcpy (v10.1+) into a 64-bit bad a$$ capable of using gobs of RAM.
Here's an (untested) example of that dictionary calculaty sort of thing... It sucks and existing table into a dictionary, and then updates the table by calcing the ID field value = to the sum of all the (original) ID values that are >= the ID value being processed. Why would you would want to do that? I don't know, but it illustrates the general concept.myTable = r"C:\temp\test.gdb\test" valueDict = {r[0]:(r[1]) for r in arcpy.da.SearchCursor(myTable, ["OID@","ID"])} updateRows = arcpy.da.UpdateCursor(myTable, ["OID@","ID"]) for updateRow in updateRows: oidVal, idVal = updateRow updateRow[1] = sum([i[0] for i in valueDict if valueDict[0] >= idVal]) updateRows.updateRow(updateRow) del updateRow, updateRows
After reading your original post, all you are doing is basically a join/calc... the performance of which still can be improved using a dictionary/update cursor. This is some older code (doesn't use .da cursors, dictionary comprehensions, etc.) but it illustrates the traditional join and calc sort of thing you are trying to accomplish.
http://forums.arcgis.com/threads/9555-Modifying-Permanent-Sort-script-by-Chris-Snyder?p=30010&viewfu...
fields = ("SEG_ID", "STL_NAME", "STR_NAME", "ST_CLASS", "ORIG_LVL", "DEST_LVL", "ONEWAY", "SPEED1", "SPEED2", "SPEED3", "SPEED4", "TRN_MODE", "BTH_MODE", "RESTR_ORI", "RESTR_DEST", "TOID", "OBJECTID") cursor = arcpy.da.UpdateCursor(FinalNetworkLayerName, fields) for featurerow in cursor: CurrentObjectID = featurerow[17] # Establishing connection to MSSQL database sql = "SELECT * FROM [DWOP].[dbo].[ufunc_ReturnGGFDataObjectID] (" + CurrentObjectId + "," + OfficeIDParm # Call function sde_return = sdeConn2.execute(sql) if isinstance(sde_return, list): for row in sde_return: featurerow[0] = row[0] featurerow[1] = row[1] featurerow[2] = row[2] featurerow[3] = row[3] featurerow[4] = row[4] featurerow[5] = row[5] featurerow[6] = row[6] featurerow[7] = row[7] featurerow[8] = row[8] featurerow[9] = row[9] featurerow[10] = row[10] featurerow[11] = row[11] featurerow[12] = row[12] featurerow[13] = row[13] featurerow[14] = row[14] cursor.updateRow(featurerow) del cursor, featurerow
rdlkDict = dict([(r.OBJECTID, (r.SEG_ID, r.STL_NAME, r.STR_NAME, r.ST_CLASS, r.ORIG_LVL, r.DEST_LVL, r.ONEWAY, r.SPEED1, r.SPEED2, r.SPEED3, r.SPEED4, r.TRN_MODE, r.BTH_MODE, r.RESTR_ORI, r.RESTR_DEST, r.TOID)) for r in arcpy.SearchCursor(FinalNetworkLayerName)])
mySqlTable = r"\\mynetwork\myconnectionfile.sde" mySqlFieldsToReturnList = ["OBJECTID", "SEG_ID", "STL_NAME", "STR_NAME", ...] rdlkDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(mySqlTbl, mySqlFieldsToReturnList)}