Including the state in input address fields significantly slows down performance when batch geocoding with the Geocode Addresses tool

StephanieBosits · ‎09-23-2020

I have published a geocode service from a composite locator which was created via the create locator tool in ArcGIS Pro 2.6.1. The Locator covers the entire state of New Jersey so it includes nearly 4 million address points along with road centerlines. With that being said maximizing performance has been a big issue. I have a table of address that I batch geocode in ArcGIS Pro as part of the performance testing. This table takes me 4 minutes and 30 seconds to complete. I noticed however, that when I leave out the state from my inputs the table takes 56 seconds to complete. The state values are all NJ in the input data used to create the locator and they are all NJ in the table I am batch geocoding. I am curious why there is such a big difference in performance and how I can assure that faster performance is achieved by our users. I could leave the state values out of my input data or provide guidance to users to not enter a state, but neither of these are ideal.

ShanaBritt · ‎09-28-2020

Stephanie:

Just to clarify, you used the Create Locator tool to create two locators and added them to a composite locator and not the Create Address Locator tool to build the locators in the composite?

Do either of these locators have any alternate name tables linked to them?

For better performance I would suggest creating a multirole locator with the Create Locator tool instead of multiple single locators added to a composite locator. I would also make use of the tips to improve performance described here, Tips for improving geocoding performance—ArcGIS Pro | Documentation .

Can you provide any details about the field mapping and any geocoding options set in the participating locators in the composite? Are there any IDs like for street that could be used to link the points to the street centerlines?

-Shana

JoeBorgione · ‎09-28-2020

Shana Britt‌ - it's possible to create a composite locator out of two or more new style locators?

That should just about do it....

ShanaBritt · ‎09-29-2020

Joe, it is possible, but not recommended if you are able to combine all of your data for a single role together. Better performance is achieved with a multirole locator and being able to minimize duplicate results.

JoeBorgione · ‎09-29-2020

Okay- I'm not doing quite as much locator creation these days, but when I do, it's the new style only...

That should just about do it....

StephanieBosits · ‎09-28-2020

Shana,

Correct, I am creating two separate locators and adding them to a composite. Unfortunately combining them into a multirole locator does not seem to work because even though I enter the address points above the roads that hierarchy does not persist in the results. So, there could be a match in my address points that has a score of 91 and a match from my roads input that gets a score of 96. Ideally I want any match from the address points over 85 to be returned even though the score is lower than the match from the roads. Perhaps I am missing something but I only see a way to set score thresholds for all roles collectively rather than on individual roles to prevent this from happening.

Both of my locators have an alternate city name table and an alternate street name table. I have tried tweaking the settings provided in the performance documentation and they don’t seem to speed up batch performance. There is an id that links the address points to the street centerlines, how can I use that my advantage?

Thank you so much for your help!

ShanaBritt · ‎09-29-2020

Stephanie:

What is the reasoning behind preferring a lower scored PointAddress? It is interesting that the PointAddress match is lower than the StreetAddress match. Given the additional information you provided about the alternate city and street name tables, I believe that an issue with linking the alternate city table in the two locators maybe the cause of the poor performance. Is the alternate city table formatted like the following or are there duplicate city names in the alternate name table? If there are duplicate records in the alternate city table it can create additional records that are not needed in the locator, which causes the locator to perform slowly because it has to search through the additional index to find the best match. This is multiplied across both locators, which causes the composite to perform slowly as well as the individual locators.

StephanieBosits · ‎10-01-2020

Shana,

I would love to get the multirole locator to work, hopefully you can point out something I am doing wrong. Here is an example of where I would like the point result returned(result B) rather than result A. I have a pop-up from the input address point data up on the right and the input road segment selected so that you can see the only element missing from the address points is the zip code which I believe is the reason the point gets a lower score. You can see that the street join ids between the address point and the road are the same. I was under the impression that if the street join ids linked the points and the roads together you could use that relationship to fill in missing attributes from one to the other but I cannot get this to work. Possibly it is because the input fields are different for Point address roles and street address roles so none of the input fields names will ever match (city vs left city/right city or zip vs left zip /right zip) and if I cannot fill in missing attributes in the points there will be many instances where the interpolated position along a road receives a higher score than the more accurate point address.

In reference to the alternate city name tables, there are no duplicates but each point has it's own record for the same alternate name so it adds up. I was told this was correct by tech support, but if I can use a one to many relationship in someway to cut this down that would be great.

Thank you!

BradNiemand · ‎10-02-2020

Stephanie,

As for the duplicate results and different scores, yes it is because there is no postal code for the PointAddress record so it is returned with a lower score. The best way to fix this is to have both datasets have the same fields. That would be by either enhancing the PointAddress dataset to include the postal code with the data (this can be done by overlaying postal polygons to apply postal codes to different points that fall within the postal polygons). Another option is to not map the postal code for the StreetAddress locator when building it. This is not ideal but would give back better ordering of the results because the scores would be the same. I would stick to option 1 if that is possible.

As for the joining of alternate city names, it is better to associate an ID with all cities with the same alternate city name and then the alternate city name table would have a single record for that alternate city name and would get linked with all of the records that had that ID. This is more optimal and will make the locator smaller and faster.

Brad

StephanieBosits · ‎10-05-2020

Hi Brad, thank you for your response.

Yes it sounds like enhancing our address points is the way to go. Also, I am having a hard time envisioning how I could slim down the alternate city name table as you mentioned in instances where an address point has multiple alternate city names. Would you be able to provide an example or diagram of this?

Thank you