Peter,
A dataset with 7.5 million Features is pretty large! What ever you do it will take time to process. Loading dictionaries with 7.5 million entries will probably cripple a machine anyway...
One quick win is to ensure your fields have attribute indices, have you created these? You can often get a significant performance boost with these alone.
Duncan
Hi Duncan
I have created attribute indices, unfortunately ArcGIS joins are very slow. I found the following post that suggested that python dictionaries outperform ArcGIS joins.
http://forums.arcgis.com/threads/55099-Update-cursor-with-joined-tables-work-around-w-dictionaries?p...
Regards
#Process: Build dictionaries of the basin size stats that we can then use in an update cursor - faster than join/calc message = "Populating BASIN_SIZE_DICT..."; showPyMessage() basinSizeDict = dict([(r.VALUE, (r.MIN, r.MAX, int(r.MEAN + .5))) for r in arcpy.SearchCursor(flowAccZoneStatTbl)]) #Int message = "Populating SLOPE_DICT..."; showPyMessage() slopeDict = dict([(r.VALUE, (r.MIN, r.MAX, int(r.MEAN + .5))) for r in arcpy.SearchCursor(slopePctZoneStatTbl)]) #Int message = "Populating CURVE_PLAN3_DICT..."; showPyMessage() curvePlan3Dict = dict([(r.VALUE, (r.MEAN)) for r in arcpy.SearchCursor(curvePlan3ZoneStatTbl)]) #Float message = "Populating ELEV_DICT..."; showPyMessage() elevationDict = dict([(r.VALUE, (r.MIN, r.MAX, int(r.MEAN + .5))) for r in arcpy.SearchCursor(elevationZoneStatTbl)]) #Int #Process: Blah blah ##...State Secrets... #Process: Update the streamFC table message = "Updating stream table..."; showPyMessage() updateRows = arcpy.UpdateCursor(streamFC) for updateRow in updateRows: streamIdValue = updateRow.STREAM_ID if streamIdValue in outFlowStreamLinkList: updateRow.OUT_FLG = 1 else: updateRow.OUT_FLG = -1 updateRow.TILE_NO = int(areaId) updateRow.SUB_NET = streamNetworkIdDict[streamIdValue] updateRow.ACRE_MIN = basinSizeDict[streamIdValue][0] * cellSize ** 2 / 43560 #SQ FEET TO ACRES updateRow.ACRE_MAX = basinSizeDict[streamIdValue][1] * cellSize ** 2 / 43560 #SQ FEET TO ACRES updateRow.ACRE_MEAN = basinSizeDict[streamIdValue][2] * cellSize ** 2 / 43560 #SQ FEET TO ACRES del basinSizeDict[streamIdValue] updateRow.ELEV_MIN = elevationDict[streamIdValue][0] updateRow.ELEV_MAX = elevationDict[streamIdValue][1] updateRow.ELEV_MEAN = elevationDict[streamIdValue][2] del elevationDict[streamIdValue] updateRow.SLOPE_MIN = slopeDict[streamIdValue][0] updateRow.SLOPE_MAX = slopeDict[streamIdValue][1] updateRow.SLOPE_MEAN = slopeDict[streamIdValue][2] del slopeDict[streamIdValue] updateRow.PCRV3_MEAN = curvePlan3Dict[streamIdValue] del curvePlan3Dict[streamIdValue] updateRows.updateRow(updateRow) del updateRow, updateRows message = "Done updating stream table!"; showPyMessage()
Here is an actual example, and also some notes:
1. This method is WAY faster if you can use the data access cursors (new in v10.1)... Like about 20x faster!
2. As Duncan said, you may indeed run out of 32-bit memory with 7.5 million rows. I've experienced this... Remember that ints take up less RAM than floats way way less than strings.... especially long strings.
3. Unlike the example below, it would be way more compact to make just one dictionary and for each look up table, append to the attribute tuple/list that each key is pointing to... No need to have duplicate keys if only one key is required.#Process: Build dictionaries of the basin size stats that we can then use in an update cursor - faster than join/calc message = "Populating BASIN_SIZE_DICT..."; showPyMessage() basinSizeDict = dict([(r.VALUE, (r.MIN, r.MAX, int(r.MEAN + .5))) for r in arcpy.SearchCursor(flowAccZoneStatTbl)]) #Int message = "Populating SLOPE_DICT..."; showPyMessage() slopeDict = dict([(r.VALUE, (r.MIN, r.MAX, int(r.MEAN + .5))) for r in arcpy.SearchCursor(slopePctZoneStatTbl)]) #Int message = "Populating CURVE_PLAN3_DICT..."; showPyMessage() curvePlan3Dict = dict([(r.VALUE, (r.MEAN)) for r in arcpy.SearchCursor(curvePlan3ZoneStatTbl)]) #Float message = "Populating ELEV_DICT..."; showPyMessage() elevationDict = dict([(r.VALUE, (r.MIN, r.MAX, int(r.MEAN + .5))) for r in arcpy.SearchCursor(elevationZoneStatTbl)]) #Int #Process: Blah blah ##...State Secrets... #Process: Update the streamFC table message = "Updating stream table..."; showPyMessage() updateRows = arcpy.UpdateCursor(streamFC) for updateRow in updateRows: streamIdValue = updateRow.STREAM_ID if streamIdValue in outFlowStreamLinkList: updateRow.OUT_FLG = 1 else: updateRow.OUT_FLG = -1 updateRow.TILE_NO = int(areaId) updateRow.SUB_NET = streamNetworkIdDict[streamIdValue] updateRow.ACRE_MIN = basinSizeDict[streamIdValue][0] * cellSize ** 2 / 43560 #SQ FEET TO ACRES updateRow.ACRE_MAX = basinSizeDict[streamIdValue][1] * cellSize ** 2 / 43560 #SQ FEET TO ACRES updateRow.ACRE_MEAN = basinSizeDict[streamIdValue][2] * cellSize ** 2 / 43560 #SQ FEET TO ACRES del basinSizeDict[streamIdValue] updateRow.ELEV_MIN = elevationDict[streamIdValue][0] updateRow.ELEV_MAX = elevationDict[streamIdValue][1] updateRow.ELEV_MEAN = elevationDict[streamIdValue][2] del elevationDict[streamIdValue] updateRow.SLOPE_MIN = slopeDict[streamIdValue][0] updateRow.SLOPE_MAX = slopeDict[streamIdValue][1] updateRow.SLOPE_MEAN = slopeDict[streamIdValue][2] del slopeDict[streamIdValue] updateRow.PCRV3_MEAN = curvePlan3Dict[streamIdValue] del curvePlan3Dict[streamIdValue] updateRows.updateRow(updateRow) del updateRow, updateRows message = "Done updating stream table!"; showPyMessage()
...and do other sorts of everyday tasks such as update feature classes that have more than 100 records.
dBreaks = {"parcel":[3272715, 3542277, 3812689, 4079535, 4350267, 4620430, 4890728, 5160184, 6770930], "pother":[6845367, 6940090, 7017134, 7077541, 7135010, 7190427, 7248224, 7301079, 7359295]} # .... breaks = dBreaks[fcSource] print breaks for step in range(len(breaks)): if step == 0: expr = "PAR_ID < "+str(breaks[step]) elif step == len(breaks)-1: expr = "PAR_ID >= "+str(breaks[step]) else: expr = "PAR_ID >= "+str(breaks[step-1])+" and PAR_ID < "+str(breaks[step]) print "step",step,expr e = 0 dApp = dict([(row[0],row[1]+" "+row[2]) for row in arcpy.da.SearchCursor(wsSource+"/"+fcLabel,["par_id","legal1","legal2"],expr)]) print "recprds",len(dApp) with arcpy.da.UpdateCursor(fcTarget,["PAR_ID",fldApp],expr) as cur: for row in cur: try: row[1] = dApp[row[0]] except: e+=1 cur.updateRow(row) if e > 0: print "errors",e[/URL]