Calculate Accessibility Matrix for large metro region

1583
7
Jump to solution
01-03-2023 05:09 PM
JosephAhrenholtz1
New Contributor II

Hello.  I'm struggling to run the Calculate Accessibility Matrix tool for a metro region, following this workflow: https://www.youtube.com/watch?v=FAmaK1fVpyY.  

After running for several days, I get "OSError: [Errno 24] Too many open files".  I presume I'm hitting the OS open file limit because I'm attempting to solve the matrix at the block level for a large metro region comprised of 9 counties and various transit agencies.  I've looked into increasing the max number of open files (ulimit), but don't believe I can increase the limit on my machine.  

I'm able to get the tool to run if I scale down to a single jurisdiction, but our job market is regional and several of the transit agencies included in the GTFS data operate regionally, thus the need to solve across all administrative boundaries within the region.

Can anyone provide any guidance for effectively using this tool on large areas like regions?  Thanks in advance.  

This video talks about how to calculate the transit accessibility to destinations. We show how you can do a comprehensive analysis of the level of public transit access in your city to important destinations, like jobs. Download the slides for these presentations at ...
0 Kudos
1 Solution

Accepted Solutions
MelindaMorang
Esri Regular Contributor

Yeah, that number of origins and destinations definitely seems intractably large for one solve, even with the cutoff time.

This documentation page explains all about network locations in case you need some background.  And this page explains specifically about precalculating network locations and how to do it.

Basically, in summary, the process of snapping the input locations to the network takes time.  If you're going to reuse the same inputs for multiple analyses, it's faster to calculate them up front once and reuse the network locations rather than having each solve operation do it over again.

The Calculate Accessibility Matrix tool will precalculate the network locations for you so it is internally efficient.  However, if you're going to distribute the process across multiple machines or run it in multiple chunks, it would be better to precalculate the locations once in advance and then turn OFF the option to precalculate them when the tool is run (because it's already been done in advance).

 

View solution in original post

7 Replies
MelindaMorang
Esri Regular Contributor

Yes, sorry about that.  Another user ran into a similar problem a month or two ago, and I updated the tools to fix it.  Please download the latest from ArcGIS Online or GitHub and try again.

0 Kudos
JosephAhrenholtz1
New Contributor II

Great, thank so much for the quick response.  I'll download again, re-run, and follow up when it finishes.  

0 Kudos
JosephAhrenholtz1
New Contributor II

Hello again.  Unfortunately the tool "failed to get OD Cost Matrix result from parallel processing" (see attached for full details).  Any thoughts on why this might be?  Again, I was able to complete the accessibility matrix when solving for a smaller geography.  

0 Kudos
MelindaMorang
Esri Regular Contributor

Hi Joseph.

First off, WOW!  This appears to be a HUGE problem.  Frankly, I'm amazed that Pro and your machine managed to survive for 630 hours 27 minutes 29 seconds (over 26 days!) of processing.  I can honestly say I've never witnessed any tool run that long, let alone one of mine.

This, of course, helps you not at all since the tool died before finishing.  The traceback unfortunately doesn't tell me much.  Basically the OD Cost Matrix calculation must have crashed or died for some reason (reason not apparent from the log), and the parallel process caught the crash and stopped the tool.  Given the size of the problem and the lengthy run time, I would guess some kind of resource limit (your computer ran out of space, ran out of CPU or memory, got tired, etc.) or some kind of process interruption (your computer tried to update, it lost a connection to an output folder if it was on a network, your virus scan did something, etc.).  Unfortunately, I really just don't know what happened.

I was actually just talking with someone yesterday about having some kind of retry logic for processes that fail like this, but that's not something currently implemented in the tool's logic.  Right now the entire tool will stop and fail as soon as one of the processes errors out.  Possibly I can consider enhancing this at some point.

Regardless, I also note that the OD Cost Matrix calculation was only a quarter of the way through.  That means that even if the process had not failed, you would be looking at running this tool for 3-4 months before you get a solution, and that really just doesn't seem tractable.  Also, it has to do some post-processing in the end once the OD Cost Matrix calculations are done, and if the problem is truly this gigantic, you may run out of memory at that point as well.

I think you will have to break this problem down into smaller parts or it just isn't going to be tractable.  Here are some ideas, data points, and suggestions to help you consider how to approach it:

  • The size of the network dataset doesn't really matter.  You can include all your transit agencies in one network and use it for each subset of inputs without substantially impacting the performance.
  • You're right that you probably need to include all or most of the destinations since your job market is regional.
  • Your best bet is to break up the origins by county or geographic area so you have manageable chunks that solve in a reasonable amount of time.
  • Precalculate the network locations in advance to save some time.
  • I noticed from your log that you're using only 5 parallel processes.  If you have the ability to use more, this will definitely help a lot.  Do you have access to a more powerful machine with more logical cores?
  • Do you have access to more than one machine?  If you could distribute the problem onto a couple of machines that can each run a chunk of origins, this would help you get a result in a reasonable amount of time.
  • Consider spinning up some cloud machines.  This problem honestly might just be intractably large for solving on one machine, and if you don't have the resources in house, maybe it would be worth the cost to rent some temporary processing capacity from the cloud.
  • What version of ArcGIS Pro are you using?  We made some substantial performance improvements to the OD Cost Matrix in the 2.9 release, so if you're using older software, updating will definitely help a lot.  ODs may be 20-70% faster, depending on the analysis settings.  If my math is right, it seems like each 1000x1000 OD chunk is taking about 14 seconds to run, which honestly seems rather slow to me.

How many origins and destinations do you actually have?

0 Kudos
JosephAhrenholtz1
New Contributor II

Hi Melinda,

 

Thanks a ton for sending your thoughts, really appreciated.

  • I'm running ArcGIS Pro 3.0.3, so it sounds like the OD Cost Matrix tool I used would have included the performance improvements
  • The region I'm running the cost matrix for consists of 102,938 census blocks that I'm using as the origins and destinations.  I think breaking up the origins by county makes a lot of sense.  I was unsure as to whether I could use the region-wide network along with subsets of inputs.  Thanks for confirming this, very helpful.  
  • I'll definitely be looking into a cloud computing option.  I have another machine with 8 cores but is typically busy with other tasks.  As you mentioned, the size of the problem is too large to solve (at least in a reasonable amount of time) without a significant increase in processing capacity.  Thanks for the tip.
  • Can you elaborate on your suggestion to precalculate the network locations in advance to save time?

 

0 Kudos
MelindaMorang
Esri Regular Contributor

Yeah, that number of origins and destinations definitely seems intractably large for one solve, even with the cutoff time.

This documentation page explains all about network locations in case you need some background.  And this page explains specifically about precalculating network locations and how to do it.

Basically, in summary, the process of snapping the input locations to the network takes time.  If you're going to reuse the same inputs for multiple analyses, it's faster to calculate them up front once and reuse the network locations rather than having each solve operation do it over again.

The Calculate Accessibility Matrix tool will precalculate the network locations for you so it is internally efficient.  However, if you're going to distribute the process across multiple machines or run it in multiple chunks, it would be better to precalculate the locations once in advance and then turn OFF the option to precalculate them when the tool is run (because it's already been done in advance).

 

JosephAhrenholtz1
New Contributor II

Got it, makes sense.. will give this some thought and retry.

Thanks so much for the guidance!  

0 Kudos