Faster method than CalculateField_management?

JohnLay · ‎06-17-2014

I have a script that joins one table to another, then calculates one column to another. Everything works as it should, but it takes 4 hours to do so if it is scripted while it only takes a few seconds if I do it manually. The script actually does a lot more in that it loops through 4 tables and joins them each to 3 different tables. This is a process that needs to occur fairly regularly, so this is not something I really want to be doing manually every time I need to update these columns.

FieldNameList = ["CATTLE", "DAIRY", "POULTRY", "SWINE"] Iteration = 0 Iteration2 = 0 if TABLETYPE == "Animal Operations":     for JoinFile in UPDATELIST_join:         if Iteration > 2:             Iteration = 0             Iteration2 = Iteration2 + 1         layer = CENSUSLayerLIST[Iteration]         ANOPSFIELD = FieldNameList[Iteration2]         arcpy.AddJoin_management(layer, "GEOID10_1", JoinFile, "GEOID10_1", "KEEP_COMMON")         arcpy.CalculateField_management(layer, ANOPSFIELD, "!NumCount!", "PYTHON_9.3")         arcpy.RemoveJoin_management(layer)         Iteration = Iteration + 1 else:     for layer in CENSUSLayerLIST:         JoinFile = UPDATELIST_join[Iteration]         arcpy.AddJoin_management(layer, "GEOID10_1", JoinFile, "GEOID10_1", "KEEP_COMMON")         arcpy.CalculateField_management(layer, CENSUSFIELDTRANS, "!NumCount!", "PYTHON_9.3")         arcpy.RemoveJoin_management(layer)         Iteration = Iteration + 1

After my first attempt ran for 4 hours, I thought I might give da.SearchCursor / da.UpdateCursor a go. But after it ran for 6 hours I gave up.

Is there a better way to skin this cat?

Anonymous User · ‎06-18-2014

Now, just so that I understand what is happening here,

Code:
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(JoinFile, readList)}

In this part, you are building a python [url=https://docs.python.org/2/tutorial/datastructures.html#dictionaries]dictionary[/url] in which the first field in the readList is the key, and the rest of the field values are the values that are matched to each key. So the dictionary looks like this for each pair:

valueDict = { field_1 : [ field_2, field_3, field_4] }

The important thing to note here is that you want to treat the first field like a join field. This field needs to have the values that match up to a field in the other table.

is building an array from which

Code:
with arcpy.da.UpdateCursor(layer, updateList) as cursor:     for row in cursor:         GEOIDval = row[0]         if GEOIDval in valueDict:             row[1] = valueDict[GEOIDval][0]             cursor.updateRow(row)
the current cursor row searches for a match. And

Code:

row[1] = valueDict[GEOIDval][0]

means the value of the update cursor row's second field (row[1]) now equals the matched object's next value from the matched part (valueDict[GEOIDval][0]). Correct?

And yes, you hit the nail on the head here. Dictionaries and the da cursors are a very powerful combo.

View solution in original post

MichaelVolz · ‎06-17-2014

I believe you need to add a index to the field used in the join.

arcpy.AddIndex_management(parameters)

Anonymous User · ‎06-17-2014

Can you provide your whole code? It is hard to tell what is going on without the "UPDATELIST", "CENSUSDATALIST", etc variables. I am pretty confident that if you use dictionaries paired with the da cursors, the performance would increase dramatically. If the tables are quite large, the add join and especially the calculate field routines will be expensive.

By using da cursors and dictionaries, you wouldn't even need to join the tables.

RichardFairhurst · ‎06-17-2014

Look at this post for the basic dictionary transfer code.

As far as join performance, are you sure you have attribute indexes on all of the fields involved in the join and transfer?

JohnLay · ‎06-18-2014

Here is the full code. I'm going to investigate the da cursors and dictionaries method. I've been aware that calculatefield management is very inefficient for a while; I just couldn't figure out another way to do the same thing. Fieldmappings are new to me, so if there is a more elegant way to write that bit, I'd be interested to learn.

SCRIPTPATH = sys.path[0]
ROOTFOLDER = os.path.dirname(SCRIPTPATH)
CENSUS_FGDB = os.path.join(ROOTFOLDER, "CENSUS.gdb")
Census_Blocks = os.path.join(ROOTFOLDER, "CENSUS.gdb\CENSUS_BLOCKS")
Census_Group = os.path.join(ROOTFOLDER, "CENSUS.gdb\CENSUS_GROUP")
Grid = os.path.join(ROOTFOLDER, "CENSUS.gdb\CENSUS_GRID")
CENSUSLIST = [Census_Blocks, Census_Group, Grid]

TABLETYPE = arcpy.GetParameterAsText(0)
UPDATELAYER = arcpy.GetParameterAsText(1)
UPDATEFIELD = arcpy.GetParameterAsText(2)# optional
CENSUSFIELD = arcpy.GetParameterAsText(3)# optional

usrName = os.getenv('USERNAME')
HOME = r'C:\Users\%s' % usrName

if CENSUSFIELD == "Buildings":
    CENSUSFIELDTRANS = "BLDCNT"
elif CENSUSFIELD == "Chemical Sites":
    CENSUSFIELDTRANS = "CHEMICAL"
elif CENSUSFIELD == "Correctional Facilities":
    CENSUSFIELDTRANS = "CORRECT"
elif CENSUSFIELD == "Dams":
    CENSUSFIELDTRANS = "DAMS"
else:
    pass

if UPDATEFIELD =='#' or not UPDATEFIELD:
    UPDATEFIELD = "NOTUSED"

DEFAULTGDB = os.path.join(HOME, "Documents", "ArcGIS", "Default.gdb")
UPDATELAYERTemp = (os.path.join(DEFAULTGDB, "UPDATELAYERTemp"))
desc = arcpy.Describe(UPDATELAYER)
type = desc.shapeType

arcpy.CopyFeatures_management(UPDATELAYER, UPDATELAYERTemp)

# List Fields and delete unnecessary
fields = arcpy.ListFields(UPDATELAYERTemp)
fieldNameList = []
for field in fields:
    if not field.required and not field.name == UPDATEFIELD:
        fieldNameList.append(field.name)
arcpy.DeleteField_management(UPDATELAYERTemp, fieldNameList)
arcpy.AddField_management(UPDATELAYERTemp,"Number","SHORT")
arcpy.CalculateField_management(UPDATELAYERTemp,"Number",1,"PYTHON_9.3")

# Define Animal Operations
ANOPCATTLE = (os.path.join(DEFAULTGDB, "ANOPCATTLE"))
ANOPDAIRY = (os.path.join(DEFAULTGDB, "ANOPDAIRY"))
ANOPPOULTRY = (os.path.join(DEFAULTGDB, "ANOPPOULTRY"))
ANOPSWINE = (os.path.join(DEFAULTGDB, "ANOPSWINE"))
exp1 = "\"" + UPDATEFIELD + "\" LIKE '%Dry%' OR \"" + UPDATEFIELD + "\" LIKE '%Beef%'"
exp2 = "\"" + UPDATEFIELD + "\" LIKE '%Dairy%' OR \"" + UPDATEFIELD + "\" LIKE '%Milk%'"
exp3 = "\"" + UPDATEFIELD + "\" LIKE '%Poultry%'"
exp4 = "\"" + UPDATEFIELD + "\" LIKE '%Swine%'"

# Define Census Layers
CENSUS_Blocks_Layer = (os.path.join(DEFAULTGDB, "CENSUS_Blocks_Layer"))
CENSUS_Group_Layer = (os.path.join(DEFAULTGDB, "CENSUS_Group_Layer"))
CENSUS_Grid_Layer = (os.path.join(DEFAULTGDB, "CENSUS_Grid_Layer"))
CENSUSLayerLIST = [CENSUS_Blocks_Layer, CENSUS_Group_Layer, CENSUS_Grid_Layer]

# Make Feature Layers from Census
Iteration = 0
for layer in CENSUSLIST:
    Newlayer = CENSUSLayerLIST[Iteration]
    arcpy.MakeFeatureLayer_management(layer, Newlayer)
    Iteration = Iteration + 1

#Define Spatial Join outputs for
UPDATELIST_join = []
CENSUS = ["Blocks", "Group", "Grid"]

if TABLETYPE == "Animal Operations":
    arcpy.Select_analysis(UPDATELAYERTemp, ANOPCATTLE, exp1)
    arcpy.Select_analysis(UPDATELAYERTemp, ANOPDAIRY, exp2)
    arcpy.Select_analysis(UPDATELAYERTemp, ANOPPOULTRY, exp3)
    arcpy.Select_analysis(UPDATELAYERTemp, ANOPSWINE, exp4)
    JOINLAYERList = [ANOPCATTLE, ANOPDAIRY, ANOPPOULTRY, ANOPSWINE]
else:
    JOINLAYERList = [UPDATELAYERTemp]

if TABLETYPE == "Lagoons":
    field_name = UPDATEFIELD
    CENSUSFIELDTRANS = "LAGOONS"
else:
    field_name = "Number"

for file in JOINLAYERList:
    Iteration = 0
    for layer in CENSUSLayerLIST:
        Export_Output = file + "_" + CENSUS[Iteration]
        UPDATELIST_join.append(Export_Output)
        fieldmappings = arcpy.FieldMappings()
        fieldmappings.addTable(layer)
        fieldmappings.addTable(file)
        FieldIndex = fieldmappings.findFieldMapIndex(field_name)
        fieldmap = fieldmappings.getFieldMap(FieldIndex)
        field = fieldmap.outputField
        field.name = "NumCount"
        field.aliasName = "NumCount"
        fieldmap.outputField = field
        fieldmap.mergeRule = "sum"
        fieldmappings.replaceFieldMap(FieldIndex, fieldmap)
        for field in fieldmappings.fields:
            if field.name not in ["NumCount", "GEOID10_1", "GEOID10_3"]:
                fieldmappings.removeFieldMap(fieldmappings.findFieldMapIndex(field.name))
        arcpy.SpatialJoin_analysis(layer, file, Export_Output, "#","#", fieldmappings)
        with arcpy.da.UpdateCursor(Export_Output, "NumCount") as cursor:
            for row in cursor:
                if row[0] is None:
                    cursor.deleteRow()
        Iteration = Iteration + 1

FieldNameList = ["CATTLE", "DAIRY", "POULTRY", "SWINE"]
Iteration = 0
Iteration2 = 0
if TABLETYPE == "Animal Operations":
    for JoinFile in UPDATELIST_join:
        if Iteration > 2:
            Iteration = 0
            Iteration2 = Iteration2 + 1
        layer = CENSUSLayerLIST[Iteration]
        ANOPSFIELD = FieldNameList[Iteration2]
        arcpy.AddJoin_management(layer, "GEOID10_1", JoinFile, "GEOID10_1", "KEEP_COMMON")
        arcpy.AddMessage("Updateing " + layer + "' " + ANOPSFIELD)
        arcpy.CalculateField_management(layer, ANOPSFIELD, "!NumCount!", "PYTHON_9.3")
        arcpy.RemoveJoin_management(layer)
        Iteration = Iteration + 1
else:
    for layer in CENSUSLayerLIST:
        JoinFile = UPDATELIST_join[Iteration]
        arcpy.AddJoin_management(layer, "GEOID10_1", JoinFile, "GEOID10_1", "KEEP_COMMON")
        arcpy.AddMessage("Updateing " + layer + "' " + CENSUSFIELDTRANS)
        arcpy.CalculateField_management(layer, CENSUSFIELDTRANS, "!NumCount!", "PYTHON_9.3")
        arcpy.RemoveJoin_management(layer)
        Iteration = Iteration + 1

JohnLay · ‎06-18-2014

da cursors with dictionaries ROCK! From 4 hours to 9 minutes! Thanks Caleb and Richard!

The updated bit of code looks like this now:

FieldNameList = ["CATTLE", "DAIRY", "POULTRY", "SWINE"]
Iteration = 0
Iteration2 = 0
if TABLETYPE == "Animal Operations":
    for JoinFile in UPDATELIST_join:
        if Iteration > 2:
            Iteration = 0
            Iteration2 = Iteration2 + 1
        layer = CENSUSLayerLIST[Iteration]
        ANOPSFIELD = FieldNameList[Iteration2]
        readList = ["GEOID10_1", "NumCount"]
        valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(JoinFile, readList)}
        updateList =  ["GEOID10_1", ANOPSFIELD]
        with arcpy.da.UpdateCursor(layer, updateList) as cursor:
            for row in cursor:
                GEOIDval = row[0]
                if GEOIDval in valueDict:
                    row[1] = valueDict[GEOIDval][0]
                    cursor.updateRow(row)
        Iteration = Iteration + 1
else:
    for layer in CENSUSLayerLIST:
        JoinFile = UPDATELIST_join[Iteration]
        readList = ["GEOID10_1", "NumCount"]
        valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(JoinFile, readList)}
        updateList =  ["GEOID10_1", CENSUSFIELDTRANS]
        with arcpy.da.UpdateCursor(layer, updateList) as cursor:
            for row in cursor:
                GEOIDval = row[0]
                if GEOIDval in valueDict:
                    row[1] = valueDict[GEOIDval][0]
                    cursor.updateRow(row)
        Iteration = Iteration + 1

Now, just so that I understand what is happening here,

valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(JoinFile, readList)}

is building an array from which

with arcpy.da.UpdateCursor(layer, updateList) as cursor:
    for row in cursor:
        GEOIDval = row[0]
        if GEOIDval in valueDict:
            row[1] = valueDict[GEOIDval][0]
            cursor.updateRow(row)

the current cursor row searches for a match. And

row[1] = valueDict[GEOIDval][0]

means the value of the update cursor row's second field (row[1]) now equals the matched object's next value from the matched part (valueDict[GEOIDval][0]). Correct?

Anonymous User · ‎06-18-2014

Now, just so that I understand what is happening here,

Code:
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(JoinFile, readList)}

In this part, you are building a python [url=https://docs.python.org/2/tutorial/datastructures.html#dictionaries]dictionary[/url] in which the first field in the readList is the key, and the rest of the field values are the values that are matched to each key. So the dictionary looks like this for each pair:

valueDict = { field_1 : [ field_2, field_3, field_4] }

The important thing to note here is that you want to treat the first field like a join field. This field needs to have the values that match up to a field in the other table.

is building an array from which

Code:
with arcpy.da.UpdateCursor(layer, updateList) as cursor:     for row in cursor:         GEOIDval = row[0]         if GEOIDval in valueDict:             row[1] = valueDict[GEOIDval][0]             cursor.updateRow(row)
the current cursor row searches for a match. And

Code:

row[1] = valueDict[GEOIDval][0]

means the value of the update cursor row's second field (row[1]) now equals the matched object's next value from the matched part (valueDict[GEOIDval][0]). Correct?

And yes, you hit the nail on the head here. Dictionaries and the da cursors are a very powerful combo.

JohnLay · ‎06-18-2014

Awesome! Thanks.