Calculate Travel Time Statistics - run in parallel

MichaelVernon · ‎04-20-2022

Hi folks,

Just found the Transit Network Analysis tools. Fantastic job, they are awesome.

I'm really hoping to use the Calculate Travel Time Statistics tool on a city-wide transit network. Any chance you have a version that can use parallel computation hanging around I can test?

I want to use the average transit time as an input to a huff gravity model, and I fear many hours of waiting around for runs to complete are ahead of me.

Cheers,

Michael

MelindaMorang · ‎05-03-2022

Hello Michael. It might be too late for you now, but I have implemented a parallelized version of Calculate Travel Time Statistics for OD Cost Matrix. You can get it on GitHub or download it here. The new version outputs the results to a CSV file, which is generally more efficient to write out to than the a file gdb table. However, I could add more output types if necessary.

View solution in original post

MelindaMorang · ‎04-21-2022

Hello Michael. Thanks for your request.

Although I have parallelized the other tools in that toolbox, I haven't done this one yet. I honestly wasn't sure if anybody was using this tool, so thank you for confirming that it has at least some value.

I can add it to my to-do list to create a parallel version of this tool, although I can't commit to doing it in the immediate future (lots of other stuff on my plate right now).

Have you tested the current tool to determine if performance is really a problem? It definitely CAN be faster if I parallelize it, but it's possible that it isn't...completely terrible...as it is...maybe...

MichaelVernon · ‎04-24-2022

Hi Melinda,

Thanks for the quick response!

Honestly I'm at the crunch point of my honors thesis, so I just went ahead a ran with the tools I had available. I haven't done a lot of stuff at this kind of scale before, so I'm not really in a position to judge whether the performance was as expected. For some context, I'm doing an accessibility analysis for access to healthcare in my city, with:

~6600 origins (census centroids)
5 destinations
Calculate statistics from 6:30am -> 11:00pm, 30 minutes interval between runs.

With this setup, I found that the tool was taking about 10 - 15mins to solve each run, so about 6hrs compute time to solve each 'day' on the transit network. I repeated for four different days (to cover all variation in the transit schedule), so required 24hrs of compute time.

I would have loved to have simulated a much smaller interval (down at 10 or 5 mins perhaps), but that clearly wasn't an option here. If I'd done 5 min intervals, it would have required ( 10mins run time * 16.5 transit hours * 12 runs/hr * 4 days / 60 mins/hour) 132 of compute time?

So I don't think that is doable. However, I'm still very grateful that I didn't have to write the code from scratch. No urgency on further development, I've got the data I needed for this project. I'm hopeful I'll get the opportunity to continue with this line of inquiry after my thesis, so if you produce a parallel version down the track I'll certainly make good use of it.

Cheers,

Michael

MelindaMorang · ‎04-25-2022

Okay, I'm glad you got what you needed. My one caution about a 30-minute interval is that some transit service runs on a 30-minute interval, so you may get some weird effects where the system is effectively in the same state every 30 minutes, which is not representative of the state(s) it's in during the times between those 30 minutes. Like if person A lives right next to a bus stop with a bus coming every 30 minutes, and they've just missed that bus by 1 minute, then it shows them as perpetually unable to get to their doctor's appointment, even though there are some times of day (like 1 minute before each analysis time slice) when they actually can.

MichaelVernon · ‎04-25-2022

Yeah absolutely - thanks for the advice. I'll flag that as a limitation of my analysis for now, and re-run the models sometime down the track.

PhillipCarleton · ‎04-26-2022

Just to add to the conversation, in a study I did recently I also had to deal with this same problem of balancing processing time with temporal coverage. I chose a time interval offset of 23 minutes. That way each iteration would start at a different minute of the hour over the course of the service day. I was trying to avoid that situation where the iteration start times and the transit trip times would sync up due to regular frequencies. If I remember correctly, I did 4 separate batches each with about 1,000 origins and between 30 - 80 destinations (they were different facility types, e.g., healthcare, education, etc.) and each run on my machine took between 4 and 10 hours and processed overnight. Also, on each run, the start times were offset by one minute (.e.g, 5:00 am, 5:01 am) so that over all the runs I would have even more diversity of trip start times. That may have been unsolicited information, but I thought it may help a little.

MichaelVernon · ‎04-27-2022

Not at all, thanks Philip.

MelindaMorang · ‎04-26-2022

@MichaelVernon

A couple of follow-up questions for when I parallelize and update the Calculate Travel Time Statistics tool.

Are you using an OD Cost Matrix or a Route as input? Do you have known OD pairs you want to solve for, or do you just need all origins to all destinations?
For the per-time-slice outputs from Calculate Travel Time Statistics, what format would be most useful for you? A file gdb feature class or table? A CSV file? Something else?

MichaelVernon · ‎04-27-2022

Hi Melinda,

I'm using an OD Cost Matrix as the input, and going all origins to all destinations.
I'm very happy with a file gdb table. I currently do a pivot operation immediately after getting the values back from travel time stats tool, then join those results back onto a feature class.

Thanks! Michael

MelindaMorang · ‎05-03-2022

Hello Michael. It might be too late for you now, but I have implemented a parallelized version of Calculate Travel Time Statistics for OD Cost Matrix. You can get it on GitHub or download it here. The new version outputs the results to a CSV file, which is generally more efficient to write out to than the a file gdb table. However, I could add more output types if necessary.