Performance Comparison: Large Scale OD Matrix Summarizing versus Closest Facility

ColeAndrews · ‎12-20-2018

Hoping the network analyst experts can give an opinion of best approach in terms of speed and performance.

Use case is a fairly large number of origins and destination. End result is we need to know the closest facility in minutes/miles, if there is a facility within 60 minutes. If closest facility is outside of 60 minutes, that's all we need to know (do not need to know minutes/miles for these). Do not need to return any route geometry, strictly tabular. Estimate there are 350k origin points, and 10k destination points.

I recognize ODCM runs faster than CF. Given the volume of origins and destination, would the fastest option be to run ODCM with a cutoff at 60 minutes, then run summary statistics on the table to find the minimum travel time value for each origin ID (since there will be tons of duplicates within the ODCM output table)? Or, would CF not take much longer than ODCM since there's no summary statistics required?

What's a general ballpark of how long it would take to run either scenario given Pro installed on a machine, input data stored locally in a GDB?

JaySandhu · ‎12-20-2018

OD and CF are similar but OD runs faster as it does not need to keep track of the traversal results and return route geometry. When you run these solvers with a 60 minute cutoff, it is quite efficient as they just ignore the destinations that were not encountered once the 60 minutes has been searched. The return list of destinations (in the line feature class) has a destination rank field and is sorted on it. This tells you the closest, second closest, etc., destination. You should be able to select all the rows where destination rank is 1 and get the closest to each origin without having to run any summary statistics. AND if you only need 1 closest, then why not solve for 1 destination in the OD solve? I do not understand why you are comparing against CF?

As far as performance, it depends how far apart are all the points and how many are in the 60 minute cutoff as it takes a while to write out the results to the line feature class. For example if there are a 100 destinations within 60 minutes of each of the 350K origins, it will create 35 million rows.

Jay Sandhu

View solution in original post

ColeAndrews · ‎12-20-2018

Jay Sandhu‌, you appear to be the network analyst expert. Could you provide any feedback?

JaySandhu · ‎12-20-2018

OD and CF are similar but OD runs faster as it does not need to keep track of the traversal results and return route geometry. When you run these solvers with a 60 minute cutoff, it is quite efficient as they just ignore the destinations that were not encountered once the 60 minutes has been searched. The return list of destinations (in the line feature class) has a destination rank field and is sorted on it. This tells you the closest, second closest, etc., destination. You should be able to select all the rows where destination rank is 1 and get the closest to each origin without having to run any summary statistics. AND if you only need 1 closest, then why not solve for 1 destination in the OD solve? I do not understand why you are comparing against CF?

As far as performance, it depends how far apart are all the points and how many are in the 60 minute cutoff as it takes a while to write out the results to the line feature class. For example if there are a 100 destinations within 60 minutes of each of the 350K origins, it will create 35 million rows.

Jay Sandhu

ColeAndrews · ‎12-21-2018

Jay Sandhu Thanks for the reply. It sounds like the fastest method to find the 1 nearest facility is OD matrix with a cutoff of 60 minutes AND destinations to find = 1? The documentation does not explicitly state that the 'number of destinations' setting is directly tied to facility proximity, so I was not sure that limiting to 1 will guarantee the closest facility as found. That is the reason I was thinking I needed to run for all then find the minimum cost record for each origin.

If an origin does not end up having 1 facility within the cutoff, is the origin still included in the OD output with a null destination, or does the output exclude any origin without a destination found?

ColeAndrews · ‎12-21-2018

Also, I do not need any sort of geometry output, so I presume setting the output shape to "none" will improve the speed even more so.

JaySandhu · ‎12-21-2018

OD with 1 facility will return the closest. Specifying a cutoff limits how far it will search out from the origin before giving up. It is always a good idea to have a cutoff in case you have a large network and there are unreachable portions of the network (due to connectivity issues or barriers) which can cause a complete search of the network looking to reach destinations that are not reachable!

The output lines feature class only has what was found. So if some origins do not have a destination within the cutoff, they will not be included.

Setting the output lines to none will be slightly faster but may not be measurable for small output.

Jay Sandhu

ColeAndrews · ‎03-01-2019

Jay Sandhu‌ Another question related to this (and every ArcGIS routing problem in general). If your input origin points are rooftop geocoded, the solver begins the routing at the street network segment and not the rooftop itself. Makes sense. So does it find the nearest starting vertex of a street line segment, or can it start midway along the line (which may be closer to the rooftop), not at an existing start vertex?

If it starts only at first vertex of a street, then it would make sense that it includes the whole segment as a line to traverse. But, if the solver can start the route midway along a line segment closest to the rooftop, then how does it allocate the travel time along the partial segment? Seems it would either include the full street segment travel time regardless of where on the line it started, or it can programmatically allocate portions of street segments as traversed?

Same concept would apply to destination locations relative to proximity to street network.

JaySandhu · ‎03-01-2019

When a location if snapped along and edge (not at end points), then only a part of the edge becomes part of the solved route (depending which way the route goes). When the shortest path time and/or distance is computed, only that part of the traversed edge counts. The travel time or distance is pro-rated.

If you have solving on a local network dataset (not online), you can run the Copy Traversed Source Features tool to write out the individual segments that makes up the whole route. You will see that the first and last edge are using not whole edge features but an actual percentage is written.

The locations can snap anywhere (percentage) along the edge not just at the closest vertex.

Jay Sandhu