Hey all,
I have a project where I need to create a join between 2 tables in order to pull corresponding data into the table that does not have the needed data. The tables have Lat and long, but the lat and long data differs between the 2 tables so that option is not as ideal as I would like and the distance varies widely between examples.
My next thought was to use the text in the address fields to create the match. To shorten the story, I used .strip() to remove extra spaces on each data column that contained address data then used .upper() to normalize the data. I then concatenated the fields into one Address field. I repeated this process on both sheets which should give me a single field to create the join. This field is formatted as a text field. However, with all of this on a table of over 3.4 million records I am only getting ~10k matches. If I copy and paste an address from the table into a blank excel sheet and do the same for the corresponding entry on the other table, I can use the remove duplicate tool in excel and it confirms the entries are matches.
What might I be missing?
can you find an example for an entry that didn't match and post it?
The duplicate tool isn't useful here as you could have repeat address data in one table, so it's identifying potentially duplicates within table one and two, not validating all joined addresses should have a match..
It sounds like you don't have matching addresses. I'd also be suspect that addresses would match if your lat/longs don't.
Hey Chris,
While I agree there could be duplicate data points within each data set, In this case, I specifically copied out 1 record in each set of data that should match over to excel to see if the duplication tool would remove 1 of the records. It did in fact accomplish this indicating that they do in fact match, at least in excel. When I run the join in ESRI it fails to make the join on the record I checked. Do you know of a script I can run that would point out differences between 2 entries? That might give me a starting point for troubleshooting.
Are there double spaces within the address fields? I was just going through a similar exercise and had "123 Main St" and "123 Main St". See the difference? 😁
Anyway, text.replace(" ", " ")
(BTW: anytime I have a text key field that has spaces, I run .strip() and .replace(" ", " ") before anything else.