Updating a feature layer with a global sql update

MarkWingfield · ‎05-21-2014

Currently, I have a feature layer that I step through row by row and update each row in the layer with output data from a SQL function. This works but is slow. I have been asked to try and speed this up and the suggestion was to attempt some form of global update in one go tat could link the feature class to a function or view and update all rows in one go. The data returned from the current function relies on inputs from the current row to create the returned data. I have attached the current code below:

    message = "[INFO] Looping through selected RoadNetwork started at " + str(datetime.datetime.now())
    print message
    arcpy.AddMessage(message)
    LogFile.write(message + "\n")
    # Call function to get extra columns
    cursor = arcpy.UpdateCursor(FinalNetworkLayerName)
    for featurerow in cursor:
        currentRow = currentRow + 1
        if currentRow%100 == 0:
            message = "[INFO] Processed " + str(currentRow) + " of " + str(RoadNetworkCount) + " rows at " + str(datetime.datetime.now())
            print message
            arcpy.AddMessage(message)
            LogFile.write(message + "\n")
        CurrentToid = featurerow.TOID
        # Establishing connection to MSSQL database

        sql = "SELECT * FROM [DWOP].[dbo].[ufunc_ReturnGGFData] ('" + CurrentToid + "'," + OfficeIDParm
        # Call function
        sde_return = sdeConn2.execute(sql)
        if isinstance(sde_return, list):
            for row in sde_return:
                featurerow.SEG_ID = row[0]
                featurerow.STL_NAME = row[1]
                featurerow.STR_NAME = row[2]
                featurerow.ST_CLASS = row[3]
                featurerow.ORIG_LVL = row[4]
                featurerow.DEST_LVL = row[5]
                featurerow.ONEWAY = row[6]
                featurerow.SPEED1 = row[7]
                featurerow.SPEED2 = row[8]
                featurerow.SPEED3 = row[9]
                featurerow.SPEED4 = row[10]
                featurerow.TRN_MODE = row[11]
                featurerow.BTH_MODE = row[12]
                featurerow.RESTR_ORI = row[13]
                featurerow.RESTR_DEST = row[14]
                cursor.updateRow(featurerow)

    del cursor, featurerow

As I stated above. this works but is quite slow (averaging over 2 seconds per row update).

It seems unlikely to me that such a global update could be done but I am new to ArcPy so may be missing something.

Cheers

Mark

MarkWingfield · ‎05-27-2014

I think I need to build the data dictionary first based on the list of rows already in the feature layer FinalNetworkLayerName as this restricts the number of row to be processed to between 3000 and 64000, whereas, in total, there are millions of potential rows to be processed. This still looks to me like I need to build the data dictionary row by row, which seems to me will probably result in the same amount of time taken if that time is taken up by mostly the SQL call istelf. However, if it is the update of the feature layer that is the majority of the 2 seconds taken, then there may be a saving, assuming that the building of the data dictionary is quick.

That would mean attempting to build the data dictionary row by row using the list of all objectIDs from FinalNetworkLayerName. So is there efficient code for this process?

Cheers

Mark Wingfield

RichardFairhurst · ‎05-27-2014

I think I need to build the data dictionary first based on the list of rows already in the feature layer FinalNetworkLayerName as this restricts the number of row to be processed to between 3000 and 64000, whereas, in total, there are millions of potential rows to be processed. This still looks to me like I need to build the data dictionary row by row, which seems to me will probably result in the same amount of time taken if that time is taken up by mostly the SQL call istelf. However, if it is the update of the feature layer that is the majority of the 2 seconds taken, then there may be a saving, assuming that the building of the data dictionary is quick.

That would mean attempting to build the data dictionary row by row using the list of all objectIDs from FinalNetworkLayerName. So is there efficient code for this process?

Cheers

Mark Wingfield

Start by making the cursor a da cursor. That is the most important thing to deal with first. The old cursors will cause 10 times the delay. Test your code on a small sample after doing that to see the difference it will make.

I would not build a query with 3,000 to 64,000 ObjectIDs. It will perform horribly since that is too huge of a list.

You could read the selected record set of the 3,000 to 64,000 records into a dictionary and then build a second dictionary doing a complete read of the entire database with millions of records but only add a record to the second dictionary where the key values matched. Then do the UpdateCursor using the second dictionary. Or perhaps you could just create the first dictionary with all of the fields needing updates, read the database with millions of records and if the key values match update the one dictionary and then use the one dictionary with the update cursor. Either of these approaches will blow your current code away and not cause a memory issue.

Your current code is already hitting an index with millions of records repeatedly each time you query the huge database, so a single read of the entire database is nothing in comparison. The single read of the unfiltered database is likely to be faster than a query filter with 3,000 to 64,000 conditions and you should try the above suggestion before creating a where clause with that many conditions.

RichardFairhurst · ‎05-27-2014

Here is a sample of code I just wrote to replace a data transfer operation that originally used joins and field calculations. It affects about 114,000 related records in two tables. It takes about a half minute to processes as compared to about 30 minutes with the original calculations. Assume readFC and outputFC have been defined previously in my code as the two related feature classes. OBJECTID in the readFC joins to FROM_OBJECTID in the outputFC.

readFieldsList = ["OBJECTID", "RID", "MEAS", "X_COORD", "Y_COORD", "X_Y_LINK", "STNAME", "STNAMES", "X_Y_ROUTE"]
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(readFC, readFieldsList)}

updateFieldsList =  ["FROM_OBJECTID","FROM_ROUTE_NAME","FROM_MEASURE","FROM_MEAS","FROM_X_COORDINATE","FROM_Y_COORDINATE","FROM_X_Y_LINK","FROM_STREET_NAME","FROM_CROSS_STREETS","FROM_X_Y_ROUTE_NAME"]
with arcpy.da.UpdateCursor(outputFC, updateFieldsList) as updateRows:
    for updateRow in updateRows:
        ObjectIDVal = updateRow[0]
        if ObjectIDVal in valueDict:
            updateRow[1] = valueDict[ObjectIDVal][0]
            updateRow[2] = valueDict[ObjectIDVal][1]
            updateRow[3] = valueDict[ObjectIDVal][1]
            updateRow[4] = valueDict[ObjectIDVal][2]
            updateRow[5] = valueDict[ObjectIDVal][3]
            updateRow[6] = valueDict[ObjectIDVal][4]
            updateRow[7] = valueDict[ObjectIDVal][5]
            updateRow[8] = valueDict[ObjectIDVal][6]
            updateRow[9] = valueDict[ObjectIDVal][7]
            updateRows.updateRow(updateRow)

This code is based on the original suggestion that reads the entire related table into a dictionary and not the suggestion of my last post. However, it shows how to retrieve items from a list inside of a dictionary.

MarkWingfield · ‎05-27-2014

Given that there was mention of using a list of objectIDs from the FinalNetworkLayerName feature layer, it would be possible for the function to be part of a SQL Select statement and the objectids passed into this as a Where clause. And then I guess this can be called to create a data dictionary but still unsure about the code.

If this was the solution (and given that I would need to deal with chunks of data as I don't believe SQL Server 2008 can deal with over 60000 entries in a single WHERE.....IN (....)... statement), how would I code:

1. Create a data Dictionary for the first chunk of objectIDs from FinalNetworkLayerName, passing these to the WHERE clause of a SQL statement
2. Update the same Data Dictionary for x chunks of objectIDs, also passing to a WHERE clause, until the end of FinalNetworkLayerName

3. Then update FinalNetworkLayerName by matching to the objectIDs in the Data Dictionary

Cheers

Mark

RichardFairhurst · ‎05-27-2014

Given that there was mention of using a list of objectIDs from the FinalNetworkLayerName feature layer, it would be possible for the function to be part of a SQL Select statement and the objectids passed into this as a Where clause. And then I guess this can be called to create a data dictionary but still unsure about the code.

If this was the solution (and given that I would need to deal with chunks of data as I don't believe SQL Server 2008 can deal with over 60000 entries in a single WHERE.....IN (....)... statement), how would I code:

1. Create a data Dictionary for the first chunk of objectIDs from FinalNetworkLayerName, passing these to the WHERE clause of a SQL statement
2. Update the same Data Dictionary for x chunks of objectIDs, also passing to a WHERE clause, until the end of FinalNetworkLayerName

3. Then update FinalNetworkLayerName by matching to the objectIDs in the Data Dictionary

Cheers

Mark

Your concept is wrong and I am no longer advocating that approach. No query where clause filter needs to be built. It can be handled by the conditional python code in the for loop that reads:

ObjectIDVal = updateRow[0]
if ObjectIDVal in valueDict:

As the cursor is read this code performs the function of a where clause against the dictionary you have built to validate that the current ObjectID value exists as a dictionary key. Updates are only done if the key is found.

RichardFairhurst · ‎05-27-2014

I will try to adapt my example code to follow my last suggestion which would read the dictionary from the selection set of the output feature class, update the dictionary from a read source that may be much larger, and finally update the output feature class from the updated dictionary:

updateFieldsList =  ["FROM_OBJECTID","FROM_ROUTE_NAME","FROM_MEASURE","FROM_MEAS","FROM_X_COORDINATE","FROM_Y_COORDINATE","FROM_X_Y_LINK","FROM_STREET_NAME","FROM_CROSS_STREETS","FROM_X_Y_ROUTE_NAME"]
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(outputFC, updateFieldsList)}

readFieldsList = ["OBJECTID", "RID", "MEAS", "X_COORD", "Y_COORD", "X_Y_LINK", "STNAME", "STNAMES", "X_Y_ROUTE"]
with arcpy.da.SearchCursor(readFC, readFieldsList) as readRows:
    for readRow in readRows:
        ObjectIDVal = readRow[0]
        if ObjectIDVal in valueDict:
            valueDict[ObjectIDVal][0] = readRow[1]
            valueDict[ObjectIDVal][1] = readRow[2]
            valueDict[ObjectIDVal][2] = readRow[2]
            valueDict[ObjectIDVal][3] = readRow[3]
            valueDict[ObjectIDVal][4] = readRow[4]
            valueDict[ObjectIDVal][5] = readRow[5]
            valueDict[ObjectIDVal][6] = readRow[6]
            valueDict[ObjectIDVal][7] = readRow[7]
            valueDict[ObjectIDVal][8] = readRow[8]

with arcpy.da.UpdateCursor(outputFC, updateFieldsList) as updateRows:
    for updateRow in updateRows:
        ObjectIDVal = updateRow[0]
        updateRow[1] = valueDict[ObjectIDVal][0]
        updateRow[2] = valueDict[ObjectIDVal][1]
        updateRow[3] = valueDict[ObjectIDVal][2]
        updateRow[4] = valueDict[ObjectIDVal][3]
        updateRow[5] = valueDict[ObjectIDVal][4]
        updateRow[6] = valueDict[ObjectIDVal][5]
        updateRow[7] = valueDict[ObjectIDVal][6]
        updateRow[8] = valueDict[ObjectIDVal][7]
        updateRow[9] = valueDict[ObjectIDVal][8]
        updateRows.updateRow(updateRow)

MarkWingfield · ‎05-29-2014

Thanks all for your help so far. I am going with the data dictionary option and am currently working through this. The issue I do have is with debugging, in that, when running my code, it suggests that the data dictionary (I have called this rdlkDict) has been created successfully (based on using a MakeQueryTable step that calls a view returning SQL data to be matched). The data in the MakeQueryTable layer looks to be fully populated when I open it in ArcMap. However, the match does not appear to have updated anything so i need to check contents of the data dictionary. My code is below:

    message = "[INFO]  Create table of general GGF data started at " + str(datetime.datetime.now()) 
    print message
    viewName = "DWOP.DBO.VWO_GenerateGeorouteFilesNonSpatial"
    fullViewName = arcpy.env.workspace + os.sep + viewName
    tableLayerName = "GGF_Data" + OfficeID
    fieldList = [[viewName + ".OBJECTID"],[viewName + ".TOID"],[viewName + ".OFFICE_ID"],[viewName + ".SEG_ID"],[viewName + ".STL_NAME"],[viewName + ".STR_NAME"],[viewName + ".ST_CLASS"],[viewName + ".ORIG_LVL"],[viewName + ".DEST_LVL"],[viewName + ".ONEWAY"],[viewName + ".SPEED1"],[viewName + ".SPEED2"],[viewName + ".SPEED3"],[viewName + ".SPEED4"],[viewName + ".TRN_MODE"],[viewName + ".BTH_MODE"],[viewName + ".RESTR_ORI"],[viewName + ".RESTR_DEST"]]
    whereClause = "OFFICE_ID = " + OfficeID
    arcpy.MakeQueryTable_management(fullViewName,tableLayerName,"USE_KEY_FIELDS",viewName + ".OBJECTID",fieldList,whereClause)  

    message = "[INFO]  Create Data Dictionary of general GGF data started at " + str(datetime.datetime.now()) 
    print message
    mySqlFieldsToReturnList = ["OBJECTID",viewName  + ".TOID",viewName  + ".OFFICE_ID",viewName  + ".SEG_ID",viewName  + ".STL_NAME",viewName  + ".STR_NAME",viewName  + ".ST_CLASS",viewName  + ".ORIG_LVL",viewName  + ".DEST_LVL",viewName  + ".ONEWAY",viewName  + ".SPEED1",viewName  + ".SPEED2",viewName  + ".SPEED3",viewName  + ".SPEED4",viewName  + ".TRN_MODE",viewName  + ".BTH_MODE",viewName  + ".RESTR_ORI",viewName  + ".RESTR_DEST"]
    rdlkDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(tableLayerName, mySqlFieldsToReturnList)}

#Would like to check the rdlkDict values here    

    message = "[INFO]  Looping through selected RoadNetwork started at " + str(datetime.datetime.now()) 
    print message

    updateFieldsList = ["OBJECTID","SEG_ID","STL_NAME","STR_NAME","ST_CLASS", "ORIG_LVL","DEST_LVL","ONEWAY","SPEED1","SPEED2","SPEED3","SPEED4","TRN_MODE","BTH_MODE","RESTR_ORI","RESTR_DEST"]
    with arcpy.da.UpdateCursor(FinalNetworkLayerName, updateFieldsList) as updateRows:
        for updateRow in updateRows:
            ObjectIDVal = updateRow[0]
            currentRow = currentRow + 1
            if currentRow%100 == 0:
                message = "[INFO]  Processed " + str(currentRow) + " of " + str(RoadNetworkCount) + " rows at " + str(datetime.datetime.now())
                print message
            if ObjectIDVal in rdlkDict:
                updateRow[1] = rdlkDict[ObjectIDVal][3]
                updateRow[2] = rdlkDict[ObjectIDVal][4]
                updateRow[3] = rdlkDict[ObjectIDVal][5]
                updateRow[4] = rdlkDict[ObjectIDVal][6]
                updateRow[5] = rdlkDict[ObjectIDVal][7]
                updateRow[6] = rdlkDict[ObjectIDVal][8]
                updateRow[7] = rdlkDict[ObjectIDVal][9]
                updateRow[8] = rdlkDict[ObjectIDVal][10]
                updateRow[9] = rdlkDict[ObjectIDVal][11]
                updateRow[10] = rdlkDict[ObjectIDVal][12]
                updateRow[11] = rdlkDict[ObjectIDVal][13]
                updateRow[12] = rdlkDict[ObjectIDVal][14]
                updateRow[13] = rdlkDict[ObjectIDVal][15]
                updateRow[14] = rdlkDict[ObjectIDVal][16]
                updateRow[15] = rdlkDict[ObjectIDVal][17]
                updateRows.updateRow(updateRow)

How can I interrogate values in rdlkDict and display each row of data so I can check whether the data dictionary is correctly populated?

ChrisSnyder · ‎05-29-2014

Can you not just run the search cursor (and populate the dictionary) directly on 'fullViewName'? I'm unsure why you are creating a query table. Note there is a parameter for applying SQL expressions directly in the search cursor, although as I recall, maybe you can't use that parameter unless there is a valid (ESRI-recognized) OID field in the table.

To interrogate values in a dictionary, access them by key. So for example:

>>> sampleDict = {"cat": ("Meow Man", 23, 12.34), "dog": ("Bark Bark", 36, 1.432), "mouse":("Minnie Me", 3, 0.123)}
>>> sampleDict["dog"]
("Bark Bark", 36, 1.432)

MarkWingfield · ‎05-30-2014

Thanks everybody. The data dictionary method was superfast and has definitely solved a very thorny problem. I did use makequerytable first to get the data into a layer i could interrogate in testing and then used this to populate the data dictionary.

All works very well

Cheers

RichardFairhurst · ‎05-30-2014

Thanks everybody. The data dictionary method was superfast and has definitely solved a very thorny problem. I did use makequerytable first to get the data into a layer i could interrogate in testing and then used this to populate the data dictionary.

All works very well

Cheers

Given all the help that Chris and I gave you, you need to assign points and check the post with the best answer to your question. See my signature line and click the link for details on assigning MVP points and answers.