Compare Access Table to SDE Feature Class

BrianLeroux · ‎04-23-2012

I am new to python so I am working through a comparison script that will let me know if there are any differences between an mdb and a feature class (FC). Both tables with contain roughly 500,000 records. When comparing it takes probably 30 seconds for each record to find a match. I find this method to be extremely slow and I still need to add another match field. Is there a better way to compare the tables?

Eventually my end result will:
Remove records from the FC that are no longer in the mdb table.
If the recor in the mdb is newer (based on comparing time stamp field) the FC record will be removed.
Any records not found or that have newer data than FC will be added to a table in my SDE database that is dynamically geocoded back to the FC.

# Import system modules
import arcpy, os

desc = arcpy.Describe 

# Local variables:

table = "D:/HOPOLICIES/Test.mdb/tblHOPolicies"
fc = "Database Connections\\ArcSDE.sde\\ArcSDE.A10713.HOPolicies_NW"

cField = "Policy_Number"

rows = arcpy.SearchCursor(table)
row = rows.next()
while row:
    tableValue = row.getValue(cField)
    
    rows2 = arcpy.SearchCursor(fc)
    row2 = rows2.next()
    
    while row2:
        
        fcValue = row2.getValue(cField)
        if tableValue == fcValue:
            print tableValue + " Match Found to " + fcValue
            break            
        row2 = rows2.next()
    row = rows.next()

StacyRendall1 · ‎04-23-2012

Brian,

I would suggest you use a search cursor to extract each lot of data into a Python dictionary, then do the analysis on them... Here is a post where someone was trying to do something pretty similar to you; give it a browse, play around with it, and fire back any questions you might have!

Cheers,
Stacy

KimOllivier · ‎04-23-2012

Imbedding any geoprocessing tool inside a cursor will always be too slow.
They are not designed to be used that way, only on whole datasets at once, any looping is built into the tools and uses indexing and other optimisation.

You need to find a tool that does the comparison inside the tool.

1. MakeQueryTable springs to mind, but first you would have to copy the table so they are both in the same database.
Nice and simple but not very flexible. You would make the comparison in an SQLquery expression.

2. Find the differences in a python set. This is the fastest by a long way (milliseconds for the whole thing).
The catch is that you still have to read the tables once each to make the sets and write out a selected set. But it will still be less than minute or two.

BrianLeroux · ‎04-24-2012

2. Find the differences in a python set. This is the fastest by a long way (milliseconds for the whole thing).
The catch is that you still have to read the tables once each to make the sets and write out a selected set. But it will still be less than minute or two.

This sounds like an ideal chioce for me but as I said my python experience is limited. Do you happen to know of any samples that show how to populate a set from a table in sde?

Thanks!

KimOllivier · ‎04-25-2012

Here is a Keyfile selection tool. Rewritten from Bruce Harold's 9.3 version for 10.0 using sets instead of FIDSet which no longer works.

My samples were two polygon featureclasses with 2.4 million records in different file geodatabases.
I selected 630,000 in one featureclass and found the corresponding records in the other.
Total elapsed time to read each featureclass and create a new selection in ArcMap took 6.5 minutes

I must catch up with the latest incarnation of ArcScripts....
Anyway it is short enough to paste in, and I will attempt to add a toolbox with a dialog

If you want to detect changes in attributes, or deleted records then the same process can be done on a dictionary instead of just a single key. Create a dictionary indexed by the key of all the attributes that need to be compared and convert the keys to a set as before. When you have the intersection, then you can compare the dictionaries for a similarly fast operation. I use this to create del/new/change sets of layers. Even geometry can be tested if you are creative about using a proxy for the shape such as a rounded area, length or xy pair of coordinates for poly,line,point featureclasses.

# Author: ESRI (#5588)
# Date:   April 27th 2009
# Purpose: This script applies an SQLquery to a layer or table view using the "IN"
#          operator and a key value set taken from another layer or table view field.
#          The output is a new selection in the same layer or table view.
#          The process is like the KEYFILE reselection option in ArcInfo Workstation.
# edit by Kim to fix selection bugs
# 17 June 2009
# 12 August 2009 FIDSet broken at 9.3.1 ??
# 25 April 2012 10.0 completely redesigned using sets, 10.x compatible, faster
# Kim Ollivier kimo@ollivier.co.nz

import arcpy
import datetime
ts0 = datetime.datetime.now()
arcpy.env.overwriteOutput = True
try:
    # raise Exception # uncomment for debugging in PythonWin
    inLayer = arcpy.GetParameterAsText(0) # Get the input layer or table view to be subqueried
    inField = arcpy.GetParameterAsText(1) # Get the input object subquery field
    keyLayer = arcpy.GetParameterAsText(2) # Get the keyfile layer or table view
    keyField = arcpy.GetParameterAsText(3) # Get the keyfile field
    selOption = arcpy.GetParameterAsText(4) # Get the selection option defaults to NEW_SELECTION
except: # debugging
    inLayer = "e:/project/BTR/view/Geocoding_business2.lyr" # Get the input layer or table view to be subqueried
    inField = "ID" # Get the input object subquery field
    keyLayer = "e:/project/BTR/view/Geocoding_business.lyr" # Get the keyfile layer or table view
    keyField = "ID" # Get the keyfile field
    selOption = "NEW_SELECTION" # Get the selection option defaults to NEW_SELECTION    
# Build two sets of keys
setInKey = set([row.getValue(inField) for row in arcpy.SearchCursor(inLayer)])
setKey = set([row.getValue(keyField) for row in arcpy.SearchCursor(keyLayer)]) # only uses selected records
# make the common set (so fast cannot be measured!
setCommon = setInKey.intersection(setKey)
print "Common",len(setCommon)
arcpy.AddMessage(str(len(setCommon))+ " common records")
if len(setCommon) > 0:
    # build an SQLquery "[inField] IN (1,2,3....)"
    delimitedInField = arcpy.AddFieldDelimiters(inLayer,inField)
    subQuery = delimitedInField + " in "+ str(tuple(setCommon)) # creates a perfect SQL style list
    # Update the layer with the new selection
    arcpy.AddMessage(selOption+" "+inLayer)
    result = arcpy.SelectLayerByAttribute_management(inLayer,selOption,subQuery)
    outLayer = result.getOutput(0)
    arcpy.SetParameter(5,outLayer)
else:
    arcpy.AddWarning("No common records, so no selection possible")
print "Total",datetime.datetime.now() - ts0

keyfile tool
------------

This tool is a replacement for the ARC/INFO workstation SELECT <cover> KEYFILE <cover> <keyitem>

It was originally 9.3 proposed by Bruce Harold using FIDSet on the Describe tool but this now fails and seems obsolete.
This is because the geoprocessing tools finally work on selected records in layers.

A better method is to use the new Python set data structure for comparisons. The set comparison is so fast
thst it can hardly be measured with datetime.

You still have to create the sets with a SearchCursor, but that may be a lot faster at 10.1 with the new da module.

If you have a very large number of records you must have the key field indexed to ensure that the selection draws
in ArcMap in a reasonable time because the SQL expression is a list, not an equation. This tool does not check for indexes.

There does not seem to be a limit on the number of records to be compared, but the largest size file for testing has been
to select 2,500 records from another file of 2.4 million records which took 5 minutes to complete. Selection 630,000 records from a set of 2.4 million and run against another earlier set of 2.4 million tool 6.5 minutes.

**Warning** Field Index names in ArcGIS are limited to 16 characters long, so you have to be careful just concatenating
_idx to the field name.

The nearest equivalent tool is MakeQueryTable. This tool has a number of limitations not in this keyfile tool.
The tables being compared can be in different databases and even different types of database.
Since only a selection is changed, all layers are in memory for speed.
Make sure that you have suitable local memory and scratch space.

Both keys have to be the same basic type for the SQL query to be valid. Either both numeric or both text.

Limitations
-----------
There is no validation or error trapping except if there are no common records.
It is recommended that the keys are indexed for large tables.
The toolbox is 9.3 format but the script is now written using 10.0 syntax with arcpy

Kim Ollivier
kimo@ollivier.co.nz
www.ollivier.co.nz

25 April 2012 (ANZAC Day)