I am attempting to update a feature class with records from a table. I'm doing this by listing all fields in the table, creating arcpy.da.SearchCursor objects from that list, then iterating through the rows of both the table and feature class, find when the ID fields match, then update the fields in the feature class row with values from the table row which share the same field name. However, I have discovered that the code (below) is only iterating through one value of the outermost for loop. I have only included the portion of code that is causing a problem. As you can see, the code only shows the mechanics of the loop.
import arcpy
arcpy.env.workspace = "path to file geodatabase"
points = "point_fc" #point feature class in geodatabase
table = "table" #table in geodatabase
fields = [f.name for f in sorted(arcpy.ListFields(table))]
id_index = fields.index('ID')
pcursor = arcpy.da.SearchCursor(points,fields)
tcursor = arcpy.da.SearchCursor(table,fields)
for prow in pcursor:
for trow in tcursor:
print prow[id_index],trow[id_index]
When I run this code, it iterates through all ID values of trow (from the table), but only one of prow (from the point feature class), then stops. Any suggestions?
Solved! Go to Solution.
Your code is doing what you designed. The reason the code never prints for any but the first prow is that the trow cursor is created outside the loop and can only be read through once. By the time the second row of prow occurs trow is fully read and stops the code, since its index is Null.
This is why embedded cursors should never be used. If you truly embed the trow cursor so that it regenerates within the prow cursor loop you are creating an exponential growth curve of loops. The processing time can easily become days long if there are enough records. Even a very modest 1,000x1,000 record pairings set results in as many as 1,000,000 record reads. My alternate code would always read just 2,000 records for a 1,000x1,000 record pairing set. That is 500 times fewer record reads. The time it takes to do pair matching in memory is negligible compared to even one cursor read through the two tables, so cursor reads are to be reduced as much as possible and memory use should be optimized as much as possible.
Unless you are processing crazy numbers of records the above code should work. But no matter what, it will always far outperform a dual cursor embedded loop.
My thanks go to Chris Synder who originally introduced me to this code design. You should always be on the look out for his posts here on the forum if you want to know many of the best Python code designs.
Try the generator and cursor approach used in this sample:
http://www.arcgis.com/home/item.html?id=da1540fb59d84b7cb02627856f65a98d
Use this code to populate a dictionary with your fields and transfer them to an Update Cursor. The look up of the dictionary will be at least 100 times faster than any embedded cursor code, since it places the look up in memory rather than hitting the hard disk like a cursor does. So there are only two cursor reads straight through 2 tables/feature classes using this code, rather than an exponential growth curve of loops through the dual cursors.
Substitute you table/feature class to be read, your table/feature class to be updated. you field list for the table to be read with the matching keyfield listed first, and a second field list for the table to be updated if the field names are different. This assumes a 1 to 1 match in the fields being read and the fields being updated. If every field in the table is matched by a field of the same name and type in the feature class you could extract a field list and just make sure the key matching field is the first in the list. Sorting the field list is not necessary.
import arcpy
arcpy.env.workspace = "path to file geodatatbase"
sourceFC = "table"
updateFC = "points_fc"
sourceFieldsList = ["ID", "FIELD1", "FIELD2"]
updateFieldsList = sourceFieldsList
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}
with arcpy.da.UpdateCursor(updateFC, updateFieldsList) as updateRows:
for updateRow in updateRows:
keyValue = updateRow[0]
if keyValue in valueDict:
for n in range (1,len(sourceFieldsList)):
updateRow
= valueDict[keyValue][n-1] updateRows.updateRow(updateRow)
del valueDict
Thanks for the example Richard Fairhurst. I have tried to employ your (Chris's) solution with the code below:
import arcpy
arcpy.env.workspace = "G:/Personal-Ben/Projects/Road Problems/Road Problems.gdb"
sourceFC = "road_table"
updateFC = "all_roads_points"
sourceFieldsList = [f.name for f in arcpy.ListFields(sourceFC)]
#Identify the index for the key value
id_index = sourceFieldsList.index('ID')
#Insert the field "ID" at the first index, remove old field
sourceFieldsList.insert(0,sourceFieldsList.pop(id_index))
updateFieldsList = sourceFieldsList
valueDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(sourceFC, sourceFieldsList)}
with arcpy.da.UpdateCursor(updateFC, updateFieldsList) as updateRows:
for row in updateRows:
print "-----------------------------"
keyValue = row[0]
if keyValue in valueDict:
#print keyValue
for n in range(1,len(sourceFieldsList)):
print "Old value for ID %s %s: %s" % (keyValue,updateFieldsList
,row ) print "Update with: %s" % (valueDict[keyValue][n-1])
print ""
row
= valueDict[keyValue][n-1] updateRows.updateRow(row)
del valueDict
del sourceFC
del updateFC
del id_index
del row
del updateRows
del updateFieldsList
del sourceFieldsList
However, the rows of updateFC are still not updating. When I read the print generated inside the for loop starting line 27, the old values and new values are the same in for every row
I understand that you are not seeing the updated values, but I am not clear whether the print statements are showing you the values you expect or not. I will assume that they are showing the correct values unless you tell me otherwise.
You have indented line 32 one level too far. it should be outside of the loop that reads the fields, since it only needs to post the update of the row after all the fields of the row are updated. Updating the row one field at a time will slow down the process needlessly.
Do not del the updateRows variable. The "with" cursor creation syntax automatically cleans up the cursor at the end. I seem to recall that if you del that variable after the loop closes it messes things up with this syntax. The variable Row becomes nothing by the time the cursor is fully read, so it does not need a del statement. The other variables are simple strings, small lists and numbers and don't normally get del statements. I only del the valueDict and never had any problems with memory leaks. I would get rid of all the del statements except for the del valueDict.
I fixed the indention problem and that did the trick! I will remove the superfluous dels.
I am glad you got it all working.
That is interesting that the indent was the problem. I suspect that only the first field was getting updated and after the row posted for the first field it would not accept any more field updates for that row. Perhaps some type of record lock occurs in Python using the updateRow operation.
Anyway, this code will save lots of time over other data transfer techniques, like using joins and field calculations. It is even possible to adapt the code to do on the fly multi-field concatenation of values to transfer data using a multi-field unique key when no single field can function as a One-to-One join field. And the last loop that goes through the fields can be dropped and converted to use the actual field index numbers of the row and the valueDict when you need to do more complex field matching and individualized operations on the fields that go beyond a simple One-to-One data transfer.
So I think you will find many uses for this code that can solve a lot of data relationship problems.
Yes, I think this will prove to be very useful and flexible. Thanks again for your help.
Your code is doing what you designed. The reason the code never prints for any but the first prow is that the trow cursor is created outside the loop and can only be read through once. By the time the second row of prow occurs trow is fully read and stops the code, since its index is Null.
This is why embedded cursors should never be used. If you truly embed the trow cursor so that it regenerates within the prow cursor loop you are creating an exponential growth curve of loops. The processing time can easily become days long if there are enough records. Even a very modest 1,000x1,000 record pairings set results in as many as 1,000,000 record reads. My alternate code would always read just 2,000 records for a 1,000x1,000 record pairing set. That is 500 times fewer record reads. The time it takes to do pair matching in memory is negligible compared to even one cursor read through the two tables, so cursor reads are to be reduced as much as possible and memory use should be optimized as much as possible.
Unless you are processing crazy numbers of records the above code should work. But no matter what, it will always far outperform a dual cursor embedded loop.
My thanks go to Chris Synder who originally introduced me to this code design. You should always be on the look out for his posts here on the forum if you want to know many of the best Python code designs.
I see, I didn't realize the cursor could only be read through once if created outside of the loop. I knew there was something having to do with the properties of the cursor that I didn't understand.