Dan_Patterson

Distance Explorations... Trees are cool

Blog Post created by Dan_Patterson Champion on Dec 12, 2018

KD trees

 

Table of contents:  (use browser back to return here)

Distance stuff, lots of tree types.  Your distribution of Python comes with scipy which has a couple of implementations.

This is just a quick demo of its potential use.  And an homage to Canadian university students, for which KD will always have a special meaning as a primary food group. Kraft Dinner for the non-student

 

Start with some points and you then want to calculate the closest 2 points to form point origin-destination pairs... because it can be done.

Steps.

  • Just deal with the coordinates first, leave the attribute baggage to the side for now.
  • Decide on the number of points you want to find the 'closest' of.  Don't get ridiculous and ask for an origin-destination matrix with a couple of thousand points.  Go back to the project design stage or look at scipy.distance.cdist and a new computer.
  • Sorting the points by X, then Y coordinate is useful in some situations.  An option to do so is provided.
  • Building the KDTree is fairly straightforward using scipy.
    • decide on the number of points to find
    • the returned list of indices will include the origin point itself, so if you want the closest 2 points, then set your query to N = 3.  This can be exploited to suck up the x,y values to form origin-destination pairs if you want to form lines, and/or polygons.
  • Decide if you want to just pull out the indices of the closest pairs with their distance.
  • Optionally, you can produce a structured array, which you can then bring into ArcGIS Pro as a table for use with a couple of ArcToolbox tools to create geometry
  • You are done.  Do the join thing if you really need the attributes.

 

The picture:

 

The code:

So this function just requires a point array of x,y pairs, the number of closest points (N), whether you want to do an x,y sort first and finally, whether you want an output table suitable for use in ArcGIS Pro.

From there, you simply use arcpy.NumPyArrayToTable to produce a gdb featureclass table. 

You can them use... XY to Line … to produce line segments, connecting the various origins and destinations as you see fit, or just bask in the creation of an... XY event layer.

 

Note:  lines 32 and 41 can use... cKDTree ...in place of... KDTree ..., if you just need speed for Euclidean calculations.

def nn_kdtree(a, N=3, sorted=True, to_tbl=True):
    """Produce the N closest neighbours array with their distances using
    scipy.spatial.KDTree as an alternative to einsum.

    Parameters:
    -----------
    a : array
        Assumed to be an array of point objects for which `nearest` is needed.
    N : integer
        Number of neighbors to return.  Note: the point counts as 1, so N=3
        returns the closest 2 points, plus itself.
    sorted : boolean
        A nice option to facilitate things.  See `xy_sort`.  Its mini-version
        is included in this function.

    References:
    -----------
    `<https://stackoverflow.com/questions/52366421/how-to-do-n-d-distance-
    and-nearest-neighbor-calculations-on-numpy-arrays/52366706#52366706>`_.
   
    `<https://stackoverflow.com/questions/6931209/difference-between-scipy-
    spatial-kdtree-and-scipy-spatial-ckdtree/6931317#6931317>`_.
    """
    def _xy_sort_(a):
        """
mini xy_sort"""
        a_view = a.view(a.dtype.descr * a.shape[1])
        idx =np.argsort(a_view, axis=0, order=(a_view.dtype.names)).ravel()
        a = np.ascontiguousarray(a[idx])
        return a, idx
    #
    def xy_dist_headers(N):
        """Construct headers for the optional table output"""
        vals = np.repeat(np.arange(N), 2)
        names = ['X_{}', 'Y_{}']*N + ['d_{}']*(N-1)
        vals = (np.repeat(np.arange(N), 2)).tolist() + [i for i in range(1, N)]
        n = [names[i].format(vals[i]) for i in range(len(vals))]
        f = ['<f8']*N*2 + ['<f8']*(N-1)
        return list(zip(n,f))
    #   
    from scipy.spatial import cKDTree
    #
    idx_orig = []
    if sorted:
        a, idx_orig = _xy_sort_(a)
    # ---- query the tree for the N nearest neighbors and their distance
    t = cKDTree(a)
    dists, indices = t.query(a, N)
    if to_tbl:
        dt = xy_dist_headers(N)  # --- Format a structured array header
        xys = a[indices]
        new_shp = (xys.shape[0], np.prod(xys.shape[1:]))
        xys = xys.reshape(new_shp)
        ds = dists[:, 1:]  #[d[1:] for d in dists]
        arr = np.concatenate((xys, ds), axis=1)
        arr = arr.view(dtype=dt).squeeze()
        return arr
    else:
        return np.array(indices), idx_orig

 

The output

Just a slightly better formatting that you can get with one of my numpy functions... obviating the need for Pandas for table niceness.

 id  X_0    Y_0    X_1    Y_1    X_2    Y_2    d_1     d_2    
--------------------------------------------------------------
000   3.00  98.00  10.00  94.00  23.00  94.00    8.06   20.40
001  10.00  94.00   3.00  98.00  23.00  94.00    8.06   13.00
002  13.00  18.00  19.00  22.00  34.00  16.00    7.21   21.10
003  19.00  22.00  13.00  18.00  34.00  16.00    7.21   16.16
004  23.00  94.00  10.00  94.00   3.00  98.00   13.00   20.40
005  34.00  16.00  19.00  22.00  43.00   1.00   16.16   17.49
006  37.00  64.00  43.00  89.00  56.00  84.00   25.71   27.59
007  43.00   1.00  34.00  16.00  66.00   6.00   17.49   23.54
008  43.00  89.00  56.00  84.00  61.00  87.00   13.93   18.11
009  56.00  84.00  61.00  87.00  43.00  89.00    5.83   13.93
010  61.00  87.00  56.00  84.00  43.00  89.00    5.83   18.11
011  66.00   6.00  76.00  20.00  43.00   1.00   17.20   23.54
012  67.00  41.00  78.00  50.00  76.00  20.00   14.21   22.85
013  76.00  20.00  66.00   6.00  67.00  41.00   17.20   22.85
014  78.00  50.00  67.00  41.00  80.00  67.00   14.21   17.12
015  80.00  67.00  91.00  66.00  78.00  50.00   11.05   17.12
016  82.00  91.00  94.00  95.00  61.00  87.00   12.65   21.38
017  91.00  66.00  80.00  67.00  78.00  50.00   11.05   20.62
018  94.00  95.00  82.00  91.00  91.00  66.00   12.65   29.15
019  96.00  40.00  78.00  50.00  91.00  66.00   20.59   26.48

 

Summary

Behind the scenes, there should be some 'spatial cleanup'.  Specifically, if you look at the image you have point pairs connected by a single segment, that is because they are the closest to one another.  Rather than duplicating the segment with opposing directions, you can 'prune' the indices and remove those prior to producing the geometry.

 

There are lots of tools that you can produce/show geometric relationships.  Use them to provide answers to your questions.  This implementation will appear soon on the code sharing site.  I will provide a link soon.

Outcomes