Removing only those duplicate attributes that have a duplicate subset of attributes (Address files)

820
4
Jump to solution
09-09-2019 03:31 PM
PrestonEllison
New Contributor III

I am merging two address features together and associating them with a parcel number. These address layers overlap geographically in some areas and i only need one record per unique address. I need to remove the duplicates, however there are valid duplicate parcel numbers with multiple addresses (ex apartments or condominiums). So i need to retain duplicate parcel records only if they contain unique address fields. In the picture ive provided an example of having 3 parcel records (KeyNum) but only 2 unique street numbers (StreetNum).

Ive tried using a select by attributes to isolate the duplicate KeyNum and then creating a selection from that selection of duplicate StreetNum but as you can expect, any duplicate StreetNum is selected regardless if they have an associated duplicate KeyNum. I need to add a qualifier in the SQL code. Here is what i have so far...

[KeyNum] In (SELECT [KeyNum] FROM [Export_Output] GROUP BY [KeyNum] HAVING Count(*)>1 )

[StreetNum] In (SELECT [StreetNum] FROM [Export_Output] GROUP BY [StreetNum] HAVING Count(*)>1 )

0 Kudos
1 Solution

Accepted Solutions
LanceCole
MVP Regular Contributor

Preston Ellison‌,

Did you consider the Delete Identical Tool.  This tool allows you to select a list of attributes and will delete all but one base upon those attributes.  CAUTION!  This tool modifies your input table or feature class, it does not create a new copy.  Please make a copy before running the tool.  You can also use Find Identical tool to generate a report of the matches considered identical.  This tool does not modify your data.  We use this tool all the time when we have to combine address lists from multiple sources. 

If you want to do this exclusively using SQL, add a SELECT DISTINCT statement to you query once you have your join assembled.

View solution in original post

4 Replies
LanceCole
MVP Regular Contributor

Preston Ellison‌,

Did you consider the Delete Identical Tool.  This tool allows you to select a list of attributes and will delete all but one base upon those attributes.  CAUTION!  This tool modifies your input table or feature class, it does not create a new copy.  Please make a copy before running the tool.  You can also use Find Identical tool to generate a report of the matches considered identical.  This tool does not modify your data.  We use this tool all the time when we have to combine address lists from multiple sources. 

If you want to do this exclusively using SQL, add a SELECT DISTINCT statement to you query once you have your join assembled.

PrestonEllison
New Contributor III

I did not realize the Delete Identical tool would parse out a selection of a selection of sorts. It worked perfectly. It did take about 6 hours to complete on (250k records) but it worked! 

Thank You

0 Kudos
LanceCole
MVP Regular Contributor

Preston,

Six hours???  Just out of curiosity, how may attributes did you run on the data with the Delete Identical?  What format is your data stored?

I just ran the tool on 200k records using the LSN string in an address file (House number, Unit Number, Street Name) such as "123 C MAIN STREET") in about 68 seconds.  I made another copy of the data and ran it using  LSN, USPS City and State in about 193 seconds.  The second run found fewer duplicates as there were duplicate streets in multiple towns that were not removed as this time they were not identical.

0 Kudos
PrestonEllison
New Contributor III

I ran the tool on 2 columns, parcel and street number. 50k+ records were removed out of the 250k. I ran it again and it was complete in under a minute. Not sure what happened yesterday...

0 Kudos