Hello. I'm struggling to run the Calculate Accessibility Matrix tool for a metro region, following this workflow: https://www.youtube.com/watch?v=FAmaK1fVpyY.
After running for several days, I get "OSError: [Errno 24] Too many open files". I presume I'm hitting the OS open file limit because I'm attempting to solve the matrix at the block level for a large metro region comprised of 9 counties and various transit agencies. I've looked into increasing the max number of open files (ulimit), but don't believe I can increase the limit on my machine.
I'm able to get the tool to run if I scale down to a single jurisdiction, but our job market is regional and several of the transit agencies included in the GTFS data operate regionally, thus the need to solve across all administrative boundaries within the region.
Can anyone provide any guidance for effectively using this tool on large areas like regions? Thanks in advance.
Solved! Go to Solution.
Yeah, that number of origins and destinations definitely seems intractably large for one solve, even with the cutoff time.
This documentation page explains all about network locations in case you need some background. And this page explains specifically about precalculating network locations and how to do it.
Basically, in summary, the process of snapping the input locations to the network takes time. If you're going to reuse the same inputs for multiple analyses, it's faster to calculate them up front once and reuse the network locations rather than having each solve operation do it over again.
The Calculate Accessibility Matrix tool will precalculate the network locations for you so it is internally efficient. However, if you're going to distribute the process across multiple machines or run it in multiple chunks, it would be better to precalculate the locations once in advance and then turn OFF the option to precalculate them when the tool is run (because it's already been done in advance).
Yes, sorry about that. Another user ran into a similar problem a month or two ago, and I updated the tools to fix it. Please download the latest from ArcGIS Online or GitHub and try again.
Great, thank so much for the quick response. I'll download again, re-run, and follow up when it finishes.
Hi Joseph.
First off, WOW! This appears to be a HUGE problem. Frankly, I'm amazed that Pro and your machine managed to survive for 630 hours 27 minutes 29 seconds (over 26 days!) of processing. I can honestly say I've never witnessed any tool run that long, let alone one of mine.
This, of course, helps you not at all since the tool died before finishing. The traceback unfortunately doesn't tell me much. Basically the OD Cost Matrix calculation must have crashed or died for some reason (reason not apparent from the log), and the parallel process caught the crash and stopped the tool. Given the size of the problem and the lengthy run time, I would guess some kind of resource limit (your computer ran out of space, ran out of CPU or memory, got tired, etc.) or some kind of process interruption (your computer tried to update, it lost a connection to an output folder if it was on a network, your virus scan did something, etc.). Unfortunately, I really just don't know what happened.
I was actually just talking with someone yesterday about having some kind of retry logic for processes that fail like this, but that's not something currently implemented in the tool's logic. Right now the entire tool will stop and fail as soon as one of the processes errors out. Possibly I can consider enhancing this at some point.
Regardless, I also note that the OD Cost Matrix calculation was only a quarter of the way through. That means that even if the process had not failed, you would be looking at running this tool for 3-4 months before you get a solution, and that really just doesn't seem tractable. Also, it has to do some post-processing in the end once the OD Cost Matrix calculations are done, and if the problem is truly this gigantic, you may run out of memory at that point as well.
I think you will have to break this problem down into smaller parts or it just isn't going to be tractable. Here are some ideas, data points, and suggestions to help you consider how to approach it:
How many origins and destinations do you actually have?
Hi Melinda,
Thanks a ton for sending your thoughts, really appreciated.
Yeah, that number of origins and destinations definitely seems intractably large for one solve, even with the cutoff time.
This documentation page explains all about network locations in case you need some background. And this page explains specifically about precalculating network locations and how to do it.
Basically, in summary, the process of snapping the input locations to the network takes time. If you're going to reuse the same inputs for multiple analyses, it's faster to calculate them up front once and reuse the network locations rather than having each solve operation do it over again.
The Calculate Accessibility Matrix tool will precalculate the network locations for you so it is internally efficient. However, if you're going to distribute the process across multiple machines or run it in multiple chunks, it would be better to precalculate the locations once in advance and then turn OFF the option to precalculate them when the tool is run (because it's already been done in advance).
Got it, makes sense.. will give this some thought and retry.
Thanks so much for the guidance!