Update existing rows/add new rows in master table from records in second table

MarkHuffman · ‎02-13-2012

I have a table that is a subset of a master table representing the parcel database. This subset is created every week and contains only updated and new records, and my task is to compare that table to the master table and, if existing records have different values, update those fields, and if there are new records, add those records to the master table.

This process has up to now been done in SDE using "sdeimport -o update_else_insert." I want to avoid interacting with SDE, but will if necessary.

My plan is to use SearchCursor to read the subset and the master table, and UpdateCursor and InsertCursor to edit the master table, using the APN as the key field. First, is this the right approach? Second, can someone please provide some starting code or psuedocode to help me? I am new at Python but not at programming. Thank you.

StacyRendall1 · ‎02-15-2012

Also, if anyone is interested, here is the code to perform this operation for arbitrary tables (fields only defined once at the start):

import arcpy from arcpy import env  # Define workspace env.workspace = "C:/PythonParcelProcesses"  # paths to tables masterTable = 'UpdateInsertTo.dbf' subsetTable = 'UpdateInsertFrom.dbf'  # define fields: keyField = 'APN' testFields = ['APN_SUFFIX', 'TRA', 'NAME_1', 'NAME_2', 'MAIL_ADDR', 'CTY_STA']  # dictionary of the contents of the subset table # APN is key, fields to test are stored in an ordered list subsetDict = {} sC = arcpy.SearchCursor(subsetTable) for row in sC:  subsetDict[row.getValue(keyField)] = [row.getValue(field) for field in testFields] del sC, row  # get list of values that need to be updated updateDict = {} sC = arcpy.SearchCursor(masterTable) for row in sC:  key = row.getValue(keyField)  testList = [row.getValue(field) for field in testFields]   # try to get record in subsetDict to see if it has changed  # if the subset table didn't contain a record for this entry, will trow KeyError (that's fine - do nothing)  try:   # if records have changed add to updateDict and delete from subsetDict   if subsetDict[key] != testList:    updateDict[key] = subsetDict[key]    del subsetDict[key]   # records are the same, delete from subsetDict   else:    del subsetDict[key]  except KeyError:   # this APN only exists in the master table, do nothing...   pass del sC, row  # anything left in subsetDict is new, can use insertCursor iC = arcpy.InsertCursor(masterTable) for key in subsetDict:  row = iC.newRow()  row.setValue(keyField, key)  for i in range(len(testFields)):   field = testFields   row.setValue(field, subsetDict[key])  iC.insertRow(row)  del row del iC  # anything in updateDict needs to be updated, use updateCursor uC = arcpy.UpdateCursor(masterTable) for row in uC:  key = row.getValue(keyField)  if row.getValue(keyField) in updateDict:   for i in range(len(testFields)):    field = testFields    row.setValue(field, updateDict[key])   uC.updateRow(row) del uC, row, updateDict, subsetDict

View solution in original post

StacyRendall1 · ‎02-13-2012

Hi Mark,

there are a huge number of ways you could do this, I would suggest taking advantage of Python dictionaries if you can... Maybe something like so (should be almost ready to run, except for any errors/typo's):

import arcpy

# paths to tables
masterTable = '//masterTable'
subsetTable = '//subsetTable'

# define fields:
keyField = 'APN'
testFeild1 = 'a'
testField2 = 'b'
#... and so on
## If you know all your fields and will never need to change them I would recommend using the 'dynamic' noation row.APN inseatd of row.getValue('APN') as I have used below

# dictionary of the contents of the subset table
# APN is key, fields to test are stroed in an ordered list
subsetDict = {}
sC = arcpy.SearchCursor(subsetTable)
for row in sC:
 subsetDict[row.getValue(keyField)] = [row.getValue(testField1), row.getValue(testField2)]
del sC, row

# get list of values that need to be updated
updateDict = {}
sC = arcpy.SearchCursor(masterTable)
for row in sC:
 key = row.getValue(keyField)
 testList = [row.getValue(testField1), row.getValue(testField2)]

 # if records have changed add to updateDuct and delete from subsetDict
 if subsetDict[key] != testList:
  updateDict[key] = subsetDict[key]
  del subsetDict[key]
 # records are the same, delete from subsetDict
 else:
  del subsetDict[key]
del sC, row

# anything left in subsetDict is new, can use insertCursor
iC = arcpy.InsertCursor(masterTable)
for key in subsetDict:
 row = iC.newRow()
 row.setValue(keyField, key)
 row.setValue(testField1, subsetDict[key][0])
 row.setValue(testField2, subsetDict[key][1])
 iC.insertRow(row)
 del row
del iC, subsetDict

# anything in updateDuct needs to be updated, use updateCursor
uC = arcpy.UpdateCursor(masterTable)
for row in uC:
 if row.getValue(keyField) in updateDict:
  row.setValue(testField1, updateDict[key][0])
  row.setValue(testField2, updateDict[key][1])
del uC, row, updateDict

The code takes a key field and two test fields (fields tested to see if they have been updated), but would be pretty expandable to any number of test fields...

Cheers,
Stacy

MarkHuffman · ‎02-13-2012

H Stacy,

I'm near speechless with gratitude. I will study this code, adapt it, and give it a try. Thanks so much.

Mark

MarkHuffman · ‎02-13-2012

Hi Stacy,

I am running the code with my test tables and fields, and I am encountering "Runtime error <type 'exceptions.KeyError'>: u'018021107'". I have narrowed it down to occurring in or after this block:

if subsetDict[key] != testList:
  updateDict[key] = subsetDict[key]
  del subsetDict[key]
 # records are the same, delete from subsetDict
 else:
  del subsetDict[key]
del sC, row

And here is my code:

import arcpy
from arcpy import env

# Define workspace
env.workspace = "C:/PythonParcelProcesses"

# paths to tables
masterTable = 'UpdateInsertTo.dbf'
subsetTable = 'UpdateInsertFrom.dbf'

# define fields:
keyField = 'APN'
testField1 = 'APN_SUFFIX'
testField2 = 'TRA'
testField3 = 'NAME_1'
testField4 = 'NAME_2'
testField5 = 'MAIL_ADDR'
testField6 = 'CTY_STA'

#... and so on
## If you know all your fields and will never need to change them I would recommend using the 'dynamic' notation row.APN instead of row.getValue('APN') as I have used below

# dictionary of the contents of the subset table
# APN is key, fields to test are stored in an ordered list
subsetDict = {}
sC = arcpy.SearchCursor(subsetTable)
for row in sC:
 subsetDict[row.getValue(keyField)] = [row.getValue(testField1), row.getValue(testField2), row.getValue(testField3), row.getValue(testField4), row.getValue(testField5), row.getValue(testField6)]
del sC, row

# get list of values that need to be updated
updateDict = {}
sC = arcpy.SearchCursor(masterTable)
for row in sC:
 key = row.getValue(keyField)
 testList = [row.getValue(testField1), row.getValue(testField2), row.getValue(testField3), row.getValue(testField4), row.getValue(testField5), row.getValue(testField6)]

 # if records have changed add to updateDict and delete from subsetDict
 if subsetDict[key] != testList:
  updateDict[key] = subsetDict[key]
  del subsetDict[key]
 # records are the same, delete from subsetDict
 else:
  del subsetDict[key]
del sC, row

# anything left in subsetDict is new, can use insertCursor
iC = arcpy.InsertCursor(masterTable)
for key in subsetDict:
 row = iC.newRow()
 row.setValue(keyField, key)
 row.setValue(testField1, subsetDict[key][0])
 row.setValue(testField2, subsetDict[key][1])
 row.setValue(testField3, subsetDict[key][2])
 row.setValue(testField4, subsetDict[key][3])
 row.setValue(testField5, subsetDict[key][4])
 row.setValue(testField6, subsetDict[key][5])
 iC.insertRow(row)
 del row
del iC, subsetDict

# anything in updateDict needs to be updated, use updateCursor
uC = arcpy.UpdateCursor(masterTable)
for row in uC:
 if row.getValue(keyField) in updateDict:
  row.setValue(testField1, updateDict[key][0])
  row.setValue(testField2, updateDict[key][1])
  row.setValue(testField3, updateDict[key][2])
  row.setValue(testField4, updateDict[key][3])
  row.setValue(testField5, updateDict[key][4])
  row.setValue(testField6, updateDict[key][5])  
del uC, row, updateDict

Can you please take another look? Thank you.

Mark

StacyRendall1 · ‎02-13-2012

Hi Mark,

Sorry - I wrote it but wasn't able to test it, so I didn't pick this up. It is occurring where there are records in the master table that don't exist in the subset table (that is fine, and from your description, expected), this causes a KeyError from the dictionary (accessing a key that doesn't exist). So I have added a try...except statement that tells it to pass (do nothing) if a key error occurs for a particular APN...

Just replace the second Search Cursor block with this:

# get list of values that need to be updated
updateDict = {}
sC = arcpy.SearchCursor(masterTable)
for row in sC:
 key = row.getValue(keyField)
 testList = [row.getValue(testField1), row.getValue(testField2), row.getValue(testField3), row.getValue(testField4), row.getValue(testField5), row.getValue(testField6)]

 # try to get record in subsetDict to see if it has changed
 # if the subset table didn't contain a record for this entry, will trow KeyError (that's fine - do nothing)
 try:
  # if records have changed add to updateDict and delete from subsetDict
  if subsetDict[key] != testList:
   updateDict[key] = subsetDict[key]
   del subsetDict[key]
  # records are the same, delete from subsetDict
  else:
   del subsetDict[key]
 except KeyError:
  # this APN only exists in the master table, do nothing...
  pass
del sC, row

Hopefully that should work fine now!

Cheers,
Stacy

p.s. How does it compare speed-wise to the method you were using previously? Well, you would probably only notice a difference if the tables are pretty large, but there are a few things you could do to speed it up a little...

MarkHuffman · ‎02-13-2012

Stacy,

Great! Thanks. Making progress. The master table is now being updated with new rows from the subset table. Strangely the first new row is being placed last in the master table, and a KeyError is being called on its key field value and, although a print command placed after the last line confirms that the script is completing, the insertCursor block does not seem to be working.

Any ideas?

Thanks again!

Mark

StacyRendall1 · ‎02-13-2012

Hi Mark,

The master table is now being updated with new rows from the subset table

Good.

Strangely the first new row is being placed last in the master table

Do you mean: 1. 'the new rows are being added at the end', or 2. ' the top-most new row is bottom-most in the updated table'? Either way:
1. The InsertCursor adds a new row to the bottom of the table, so this is expected.
2. Dictionary keys are not actually ordered; so the order in which the rows are added to the table is (effectively) random.

and a KeyError is being called on its key field value

I need more detail on this.

A KeyError means that it is asking the dictionary for a key that doesn't exist. Which block is this occurring in?

and, although a print command placed after the last line confirms that the script is completing, the insertCursor block does not seem to be working.

If the Insert Cursor is not working, new rows would not be being added from the subset table (your first comment) - do you mean the Update Cursor?

If there are still issues, can you attach your test files and I will try to sort them out...

MarkHuffman · ‎02-14-2012

Hi Stacy,

I wrote my last post in a rush while doing something else. I'll clarify.

Do you mean: 1. 'the new rows are being added at the end', or 2. ' the top-most new row is bottom-most in the updated table'? Either way:
1. The InsertCursor adds a new row to the bottom of the table, so this is expected.
2. Dictionary keys are not actually ordered; so the order in which the rows are added to the table is (effectively) random.

I meant #2 - but that's okay since they are added in random order, as you say.

I need more detail on this.

A KeyError means that it is asking the dictionary for a key that doesn't exist. Which block is this occurring in?

The KeyError calls out the APN value of the last new record in the master table: Runtime error <type 'exceptions.KeyError'>: u'203006146' -- That is the last record in the newly updated master table.

If the Insert Cursor is not working, new rows would not be being added from the subset table (your first comment) - do you mean the Update Cursor?

Yes, I meant the Update Cursor. The error is occurring in the Update Cursor block. The changes to the existing records in the subset table are not updated to the master table.

Thanks again for all your time and help.

Mark

StacyRendall1 · ‎02-14-2012

Mark,

can you send me the files you are testing with (you can attach files to posts)? If you have to, zip them together.

I just can't figure what the problem might be without being able to test stuff and see it for myself.

Stacy

MarkHuffman · ‎02-14-2012

Mark,

can you send me the files you are testing with (you can attach files to posts)? If you have to, zip them together.

I just can't figure what the problem might be without being able to test stuff and see it for myself.

Stacy

Stacy,

I have sent you a private message to arrange to send the data by more secure means. Thanks.

Mark