Get records that exist in one table but not another

4191
9
06-11-2014 07:32 AM
by Anonymous User
Not applicable
Original User: mleber

Hello, I am relatively new to using the model builder and I have a problem maybe someone could help me with. I have a table in a Geodatabase, and I have a shapefile. The geodatabase table for sake of argument has 1000 records, and the shapefile has 900 records. I would like a model that would pull out or extract those 100 records that exist in the geodatabase table but not in the shapefile. Is this easy to do, and does anyone have suggestions on how to go about this?

thanks
mike
0 Kudos
9 Replies
markdenil
Occasional Contributor III
If you have attributes that match for matching records in each file
you could join the smaller table to the larger using Add Join with KEEP_ALL
The records that do not have any attributes from the shapefile joined are the ones
unique to the featureclass

If you lack common fields for a table join, then
Select Layer By Location in the management toolbox will do something similar
again, it identifies the set common to both; the leftovers are the unique ones.
0 Kudos
by Anonymous User
Not applicable
Original User: rfairhur24

Hello, I am relatively new to using the model builder and I have a problem maybe someone could help me with. I have a table in a Geodatabase, and I have a shapefile. The geodatabase table for sake of argument has 1000 records, and the shapefile has 900 records. I would like a model that would pull out or extract those 100 records that exist in the geodatabase table but not in the shapefile. Is this easy to do, and does anyone have suggestions on how to go about this?

thanks
mike


If they have a common unique ID attribute that matches in both databases, then a standard join and a selection where the FID of the shapefile IS NULL will get you the answer of which IDs do not exist in the shapefile that do exist in the geodatabase.  However, for the set that did match on the common ID, this won't tell you if they have changed geometry or other attributes, so it is not a complete comparison.

If the only relationship is spatial it is nearly impossible if they are complex polygon shapes that have to have identical geometry.  Line geometry is also difficult to process, but with an advanced license there are some tools that make it much easier.  Points can use Spatial Join, provided there is no overlapping points in a the shapefile.

So what constitutes a match for your data?
0 Kudos
RobertBorchert
Frequent Contributor III
Or pull them both into Access and do a unmatched Query.  However, this will only result in a tabular answer.

The first suggestion is the one I would run with as the quickest answer. Model builder is not the solution here.  Just to cumbersom for this small of a task.

Do the join and keep all.  Before hand you can create a new attribute.  once it is joined you can perform a query to find those that are not null and field calculate to put in a designator saying it exists in the shape.

If they have a common unique ID attribute that matches in both databases, then a standard join and a selection where the FID of the shapefile IS NULL will get you the answer of which IDs do not exist in the shapefile that do exist in the geodatabase.  However, for the set that did match on the common ID, this won't tell you if they have changed geometry or other attributes, so it is not a complete comparison.

If the only relationship is spatial it is nearly impossible if they are complex polygon shapes that have to have identical geometry.  Line geometry is also difficult to process, but with an advanced license there are some tools that make it much easier.  Points can use Spatial Join, provided there is no overlapping points in a the shapefile.

So what constitutes a match for your data?
0 Kudos
by Anonymous User
Not applicable
Original User: solxenos

Or use SQL Server Express and create a table view...
0 Kudos
by Anonymous User
Not applicable
Original User: mleber

Thanks for the replies. I do have a match in both tables, the Parcel ID. I know how to do this manually, but I am trying to automate this process so that is why I was thinking of doing this with model builder.
0 Kudos
RichardFairhurst
MVP Honored Contributor
Thanks for the replies. I do have a match in both tables, the Parcel ID. I know how to do this manually, but I am trying to automate this process so that is why I was thinking of doing this with model builder.


The keys to joining tables in Model Builder is to use the Make Feature Layer or Make Table View tools on the primary target of the join before using the Add Join tool.  It is also very important to use the Add Attribute Index tool on the Parcel ID in both tables before the first time you create the Join.  Finally, when you are done with the Join use the Remove Join tool.

I do not use the Join Field tool, since it takes too long to add the fields and permanently alters the input table.  Add Join and Remove Join operate in memory and are much faster.
by Anonymous User
Not applicable
Original User: solxenos

I think this task is more easily achieved using some custom code or relational database views, that is how have done this in the past.  In custom code I can load all the data into arrays and compare between them.  We find those that are in table1 but missing from table2 and those in table2 that are missing from table 1.  So there are two outputs, parcel ids missing from table 1 and those missing from table 2.  Before using custom code this was done in SDE/SQL by creating 2 table views (join) between the two tables/layers and creating the querying to look for null values on the joined table side.

If you must use Model Builder, I'm not sure what the best option is there...  You may want to take a look at Make Query Table tool, but the inputs cannot be a shapefile.  And the output is temporary unless the document is saved or copied to another layer.  This could give you the parcel IDs that match if you build the expression like this 'Table1.PID = Table2.PID'  - But in reality you want to find those missing from each other, right?  It would give you a table of similar parcels that you could use to compare each of the other layers to, but I'm not sure it helps that much. There are many new tools in ArcGIS 10.x and I'm a bit surprised if there isn't a tool for this sort of analysis.

You don't have SDE by any chance?  A relational database or custom app could get the results better than model builder, unless there is a tool for this that I am unaware of.

- I think Richard has you pointed in the best direction for using model builder with his previous post.  Let us know how it turns out.
0 Kudos
by Anonymous User
Not applicable
Original User: mleber

I think this task is more easily achieved using some custom code or relational database views, that is how have done this in the past.  In custom code I can load all the data into arrays and compare between them.  We find those that are in table1 but missing from table2 and those in table2 that are missing from table 1.  So there are two outputs, parcel ids missing from table 1 and those missing from table 2.  Before using custom code this was done in SDE/SQL by creating 2 table views (join) between the two tables/layers and creating the querying to look for null values on the joined table side.

If you must use Model Builder, I'm not sure what the best option is there...  You may want to take a look at Make Query Table tool, but the inputs cannot be a shapefile.  And the output is temporary unless the document is saved or copied to another layer.  This could give you the parcel IDs that match if you build the expression like this 'Table1.PID = Table2.PID'  - But in reality you want to find those missing from each other, right?  It would give you a table of similar parcels that you could use to compare each of the other layers to, but I'm not sure it helps that much. There are many new tools in ArcGIS 10.x and I'm a bit surprised if there isn't a tool for this sort of analysis.

You don't have SDE by any chance?  A relational database or custom app could get the results better than model builder, unless there is a tool for this that I am unaware of.

- I think Richard has you pointed in the best direction for using model builder with his previous post.  Let us know how it turns out.


I don't have to use Model Builder, I just thought that would be a relatively simple task to do there. Your scenario sounds very similar to would I would like to accomplish, do you have any of the sample code that you could share?
0 Kudos
by Anonymous User
Not applicable
Original User: solxenos

Actually it looks like I load all the parcel IDs into one array in the end.  When populating the combined array I add the pipe symbol and a character that designates which table it comes from.  Then I go through the array of parcel IDs and load the first and second item in the array into variables.  If they differ then depending on the designator character and how that sorts I know which table is missing a value.  

I guess it would matter what form you Parcel IDs are in.  For strictly numeric you may need to pad the parcel numbers with leading zeros to sort them properly as a string and compare them.

Here is a scaled down snippet to show the logic I'm trying to explain when comparing the data.  It's already loaded into one combined array of parcel IDS with the pipe symbol and table designator on the end.  You will need to add a dummy record to the end of the array - I use "zzzzzzzzzzzzzz" so it will sort to the bottom.

 Array.Sort(CombineData)
Do Until i >= UBound(CombineData)
     Var1 = Mid(CombineData(i), 1, (InStr(1, CombineData(i), "|") - 1)) ' gets parcel ID
     Var2 = Mid(CombineData(i + 1), 1, (InStr(1, CombineData(i + 1), "|") - 1)) ' gets next parcel ID
      
     If Var1 = Var2 Then
                    i = i + 1
     Else
          If Mid(CombineData(i), (InStr(1, CombineData(i), "|") + 1)) = "A" Then ' checks table designator
               ListBox2.Items.Add(Mid(CombineData(i), 1, (InStr(1, CombineData(i), "|") - 1))) 
          ElseIf Mid(CombineData(i), (InStr(1, CombineData(i), "|") + 1)) = "Z" Then ' checks table designator
               ListBox1.Items.Add(Mid(CombineData(i), 1, (InStr(1, CombineData(i), "|") - 1))) 
          End If
     End If
     
i = i + 1

Loop
0 Kudos