Hi all! I am a graduate student in Criminology and had a few questions regarding geocoding which I was hoping experts in this field could help me address. We are working on a large, multi-city project which involves geocoding crime data from over 100 cities across the U.S. With that being said, I have a lot of incident-level data to geocode in order to aggregate to counts by census tract. Additionally, some of the data is not of the best quality, and therefore there are often a decent amount of candidates tied.
My question is, within the geography/geocoding literature and or in practice, when addressing large datasets (2+ Million Records) what is generally done with the cases where the candidates tie? While there are not a tremendous number of ties (3-5%), with datasets this large it would be impractical to go through and select one of the tied addresses by hand for each case which has tied. I am wondering if there is any literature within the field, or a general rule which is followed for adding these cases to the tied or untied pile.
It seems to me that because our ultimate goal is census tract counts it may be appropriate to use the ties because it is unlikely that choosing either of the matched addresses would lead to the point being assigned to a different tract. However, I am not sure that this is the best way to approach this issue and that is why I have come to you all for help.
I appreciate any knowledge on this issue, including sources I should be familiar with. Thank you for your time. Cheers
Sincerely,
Kevin Wolff