I'm working on producing a large OD Matrix (55,000 x 55,000) using ArcGIS 10. I have written a script in Python so that I don't run into memory issues. The script that makes the OD layer, adds all 55,000 origins, then enters a loop, adds 100 destinations, solves, and outputs the data. The only problem is that each iteration of the loop takes a long time to complete (mostly to solve).
1) defined a hierarchy 2) hard coded my cost attributes into the network as the smallest useful data types (floats) 3) dissolved the network 4) calculated my origin and destination locations
Is there anything I can do to speed up the process?
Also, I've noticed that solve times increase with each iteration (at an increasing rate). I believe that this may be because ArcGIS is writing to an ever increasing temp file. Is there any way I can prevent this from happening? I clear the previous destinations in each iteration of the loop, so I don't think that's part of the problem.
Here is one possible suggestion. Instead of iteratively solving multiple 55,000 x 100 instances, try iteratively solving it as 100 x 55,000 instances. The reason for this is that, for hierarchical searches, the process has two phases. The first phase searches backwards from all destinations until they reach the highest level hierarchy. This phase is generally very fast. The second phase searches forward from all origins until they find the points of entry into the highest level hierarchy for each destination computed in the first phase. This second phase tends to be the bottleneck in the solve as it searches much farther than the first phase.
Therefore, when you solve multiple 55,000 x 100 cases, the solver is probably doing much more work per iteration in the second phase than if you instead tried solving it as multiple 100 x 55,000 cases. Put differently: in your current workflow (55,000 x 100), you are recalculating those same (more exhaustive) origin searches over and over again at each iteration, but looking for different destinations each time. It should be more efficient to only have to calculate the same (short) destination searches each time, while searching from only a few different origins each time. Does this make sense? Let me know if this helps any.
Hey Michael - thanks for the suggestion. It does make sense. I was able to update my script to iterate over origins instead of destinations. It works fine for smaller, test samples (100 x 1,000), but won't work on the full run (100 x 55,000). I've tried various destination sizes, and I can get runs of up to 100 x 54,000 to work without issue. Anything more than that, and I get the error:
[INDENT]ERROR 030024: Solve returned a failure. Failed to execute (Solve).[/INDENT]
The message doesn't specifically say that the error is because I'm running out of memory, but I suspect that may be what's going on. I can get 1 x 55,000 and 50 x 55,000 to run.
For the test sample runs, there doesn't seem to be any difference in speed, but those run so fast that the difference might not be perceptible. Although I can't get an apples-to-apples comparison, the 100 x 54,000 run seems to take about the same amount of time as the 55,000 x 100 run, so I'm not sure if iterating over origins instead of destinations will have a big effect.
Given the explanation of how the solver works in your previous post, one thing I wonder about is whether my hierarchy is accurate. I built my own network (for the city I'm analyzing, the Washington, DC PMSA) from the StreetMaps_NA data and defined my hierarchy based on the ACC field as:
Yes, it is likely that my suggestion would require more memory than your original approach, since the solve must store all of the backward entry points for each destination. Since your recent experiments don't appear to see much difference, then perhaps your original approach is best.
As for whether or not your hierarchy is accurate, the only online resource that I know of regarding ACC classifications is an ESRI whitepaper on defining hierarchies. Here is the link to the paper. Appendix B of this paper should have some useful information.
Out of curiosity, is there a reason you cannot rely on the original Streetmap dataset (which already has hierarchies defined) for your analysis purposes?
Michael - I built my own network because I'm just looking at the DC PMSA, so I figured processing would be more efficient with a smaller street network. I can try running my analysis on the Streetmap network to see if my assumption was false.
As far as memory, I have the ability to add more RAM to my computer. I've currently got 8GB, but I can go up to 16GB, and I'm running a 64-bit system. I know that ArcGIS is a 32-bit program, so I wasn't sure whether the additional RAM would be utilized. Do you have any sense as to whether it would help?
Jay - My 55,000 points are the Census block centroids in the DC PMSA. I do need data based on trips between all points.
Are you running on a 32 bit OS or 64 bit OS? ArcGIS 10.0 can access up to 4 GB only if running on a 64 bit OS else it is limited to 2 GB. Also you may want to run the solve as a background GP process as that will have less overhead on RAM instead of running the OD solve in a foreground process.
If all your 55,000 points are in the DC area, say within one hour of each other, then put a cutoff of 90 minutes and turn off the hierarchy and then solve. It should not run out of memory for 100 by 55000 as it will not have to compute the backward trees which really help in long distance routing with hierarchy. You need to add the cutoff in case you have some unreachable locaions (due to restructions) and then it will keep searching the entire network before giving up and can run out of memory.
I wanted to update you on my experiences trying the suggestions you made. I updated my hierarchy based on the algorithm given in Appendix B of the ESRI whitepaper. It increased computation times slightly. It does not appear to drastically change the resulting OD matrix, although I have only done some preliminary comparisons.
I also tried running my OD Matrix using the the original Streetmap dataset. Computation times are slightly longer, and results are comparable to those from the network I built (with the hierarchy based on the algorithm given in Appendix B).
Finally, I tried adding the cutoff as Jay suggested. When I turn off the hierarchy and use a cutoff, solve times increase exponentially. Using the hierarchy and a cutoff that's about the 90th percentile of travel times does not affect solve times.
I'm working on decreasing the number of origins and destinations I compute and interpolating my results to all blocks.