## This approach involves appending the shape_area field from ## the target dataset to a copy of the source dataset and then ## doing a comparison between the areas. The results get reported ## and the copy gets dropped. This was necessary due to a bug in ## 10.1 that prevents the deletion of the right most field print "Importing arcpy module...\n" import arcpy ## Compare Feature Area print "Compare Feature Area\n" ## Variables sourceFC = "C:\\GIS\\Source.gdb\\UA1991_NZTM" destFC = "C:\\GIS\\Dest.gdb\UA1991_NZTM" TempFC = "C:\\GIS\\Source.gdb\\TEMPFC" ## Create a temporary table to perform the join on. print "Take a copy\n" arcpy.Copy_management(sourceFC, TempFC, "") ## Join the temporary table to the target table print "Perform Join\n" arcpy.JoinField_management (TempFC, "UA1991", destFC, "UA1991","Shape_Area") ## Variables fields = ["UA1991","UA1991_NAME","Shape_Area","Shape_Area_1"] whereClause = "Shape_Area <> Shape_Area_1" ## Create a cursor object containing all records ## where the shape_area differs between datasets ## and report output to screen print "Find records that differ" notEqualCur = arcpy.da.SearchCursor(TempFC, fields, whereClause) for row in notEqualCur: print "\nUA1991_NAME = " + str(row[1]) print "UA1991 = " + str(row[0]) print "S Area = " + str(row[2]) print "D Area = " + str(row[3]) areaDiff = (row[2] - row[3]) areaDiffPerc = areaDiff/row[2] if abs(areaDiffPerc) > 0.0001: print "AREA DIFFERENCE: " + str(areaDiff) print "AREA DIFFERENCE PERCENTAGE: " + str(abs(areaDiffPerc)) ## Drop the temporary table arcpy.Delete_management(TempFC) ## delete the cursor del notEqualCur print "\nDone."
Solved! Go to Solution.
Hi Mathew
We investigated the Feature Compare function. I was under the impression that the two datasets required the same records and record count. In our case there is a chance that one or more rows may have been dropped. I understood this to mean that it was not possible to do a comparison on datasets with different row counts. That is why the search cursor approach was used. Spin around one cursor populating another cursor using a where clause = slow. I'd love to hear if there was a way to use Feature Compare on datasets with different rowcounts.
Thanks
Eli
I have a process that copies data between dev and UAT. There is a need for a QA script that can confirm that the data copies across cleanly. We are primarily interested in whether the "Shape_Area" is the same between the two gdbs (including some tolerance for rounding).
Dev is a file gdb and UAT is an enterprise gdb. The main point is that they are different gdbs.
The dev gdb and the UAT gdb have the same name. The tables to compare also have the same name as do the fields that you would join on.
We initially used two SearchCursors which worked but was slow.
I explored the MakeQueryTable_management function but it requires the tables to be in the same workspace.
Here is the current approach:## This approach involves appending the shape_area field from ## the target dataset to a copy of the source dataset and then ## doing a comparison between the areas. The results get reported ## and the copy gets dropped. This was necessary due to a bug in ## 10.1 that prevents the deletion of the right most field print "Importing arcpy module...\n" import arcpy ## Compare Feature Area print "Compare Feature Area\n" ## Variables sourceFC = "C:\\GIS\\Source.gdb\\UA1991_NZTM" destFC = "C:\\GIS\\Dest.gdb\UA1991_NZTM" TempFC = "C:\\GIS\\Source.gdb\\TEMPFC" ## Create a temporary table to perform the join on. print "Take a copy\n" arcpy.Copy_management(sourceFC, TempFC, "") ## Join the temporary table to the target table print "Perform Join\n" arcpy.JoinField_management (TempFC, "UA1991", destFC, "UA1991","Shape_Area") ## Variables fields = ["UA1991","UA1991_NAME","Shape_Area","Shape_Area_1"] whereClause = "Shape_Area <> Shape_Area_1" ## Create a cursor object containing all records ## where the shape_area differs between datasets ## and report output to screen print "Find records that differ" notEqualCur = arcpy.da.SearchCursor(TempFC, fields, whereClause) for row in notEqualCur: print "\nUA1991_NAME = " + str(row[1]) print "UA1991 = " + str(row[0]) print "S Area = " + str(row[2]) print "D Area = " + str(row[3]) areaDiff = (row[2] - row[3]) areaDiffPerc = areaDiff/row[2] if abs(areaDiffPerc) > 0.0001: print "AREA DIFFERENCE: " + str(areaDiff) print "AREA DIFFERENCE PERCENTAGE: " + str(abs(areaDiffPerc)) ## Drop the temporary table arcpy.Delete_management(TempFC) ## delete the cursor del notEqualCur print "\nDone."
I'm interested in a faster approach with less IO. Is it possible to do a join across gdbs without copying the fields to the source table?
I have a hard time believing two search cursors would take longer than copying out both datasets and joining them locally. In any case, have you tried the Feature Compare tool?
I have a hard time believing two search cursors would take longer than copying out both datasets and joining them locally. In any case, have you tried the Feature Compare tool?
Faster approach: Copy your FC's to the IN_MEMORY space then peform the join and querying there..
Don't access your feature class straight from disc . Make both of them as Feature Layer using MakeFeatureLayer_management and try running search cursor . I believe it should be much faster .
Hi Mathew
We investigated the Feature Compare function. I was under the impression that the two datasets required the same records and record count. In our case there is a chance that one or more rows may have been dropped. I understood this to mean that it was not possible to do a comparison on datasets with different row counts. That is why the search cursor approach was used. Spin around one cursor populating another cursor using a where clause = slow. I'd love to hear if there was a way to use Feature Compare on datasets with different rowcounts.
Thanks
Eli