Measuring distance between one point and multiple other points along a polyline of a complex dendritic network

940
8
07-23-2020 06:25 PM
EricWalther
New Contributor II

Hi, I am new to GeoNet so apologies if this question has been answered elsewhere and I have not found it yet. I have searched previous posts and have not found a solution (for what I can tell) to my problem.

 

I have a point shapefile with ~60,000 points. For each point in this shapefile, I need to obtain the distance between that point and every other point along a polyline, which is a separate line shape file. The line shapefile represents a complex dendritic river network. For example, for point xi, I need to calculate the distance between point xi and point y1, y2...yn, where n is the number of points in the point shapefile. I do not have access to the Network Analyst extension. I am a python novice, but competent at coding in R which has been somewhat helpful trying to learn python. I have also used model builder if there is a solution using that approach. Thanks for the help!

8 Replies
DanPatterson
MVP Esteemed Contributor

What final information do you need to get out of the distances?

I don't think that you want 60,000 to X combinations of distances without a final purpose


... sort of retired...
0 Kudos
EricWalther
New Contributor II

Hi Dan,

I am developing a spatially explicit regression model and need the distances between each event (point in the shapefile) to develop my covariance matrix for modeling. The near analysis will give you euclidean distance between points, correct? I need the actual river distance (distance traced along the polyline) between each point. Using euclidean distance between locations will result some pairs that are actually relatively far apart  within the river network seem closer to each other than they really are. Thanks!

Eric

0 Kudos
DavidPike
MVP Frequent Contributor

This could pretty easily be accomplished using Near analysis, iterating through each point, but as Dan suggests - what do you want to do with the million/billion values you get and for what purpose?

0 Kudos
DanPatterson
MVP Esteemed Contributor

Near will croak


... sort of retired...
0 Kudos
DavidPike
MVP Frequent Contributor

I'm thinking of just having 1 point being compared against the line vertices each time rather than throwing all of them in? At least be able to iterate again after the last failure when it runs out of memory.

0 Kudos
EricWalther
New Contributor II

Ideally the output would be a table with the distance between each pair of points which I can use as my distance matrix for modelling. Note: I have looked into the STARS toolkit using a spatial stream network approach. However, my stream network shapefile is not appropriately structured as a SSN and it would be a heavy lift to have to go in an manually modify the network. As a result, I have been trying to pursue other alternative approaches which led me to this exercise I have been trying to tackle. Let me know if this makes sense and if any additional information would be helpful!

0 Kudos
DanPatterson
MVP Esteemed Contributor

I would go for that toolset, if you have ruled out working with a raster approach using something like

Flow Distance function—ArcGIS Pro | Documentation 

if you have the Spatial Analyst extension


... sort of retired...
0 Kudos
DuncanHornby
MVP Notable Contributor

Eric,

The distances you want to extract is a computationally massive problem. Even if you had "just" 10,000 points in your second shapefile and the network was a single catchment then you would need to generate 600,000,000 routes!

As you do not describe the extent of your network or your second set of points then 600 million would not be an unrealistic number.

I was recently approached by someone because I'm the author behind RivEX, a river network processing tool, capable of generating the values you want but their requirements were much like yours, 100's of million of distances needed to be generated, well beyond the design specifications of RivEX. They decided to take it in house and develop code themselves to answer this question.

The paper is here: https://doi.org/10.3390/data5010008

The code is here: https://gitlab.com/njacadieux/upstream_downstream_shortests_path_dijkstra 

I've not used it, so can't comment on how easy it is to install and use it, but I get the impression they want you to run it on a multi-core machine, presumably because it's running the code in some parallel form?

0 Kudos