Spanning trees

2606
8
01-31-2017 09:01 AM
Labels (1)
DanPatterson_Retired
MVP Emeritus
4 8 2,606

Connecting the dots.  Spanning trees have a wide variety of applications in GIS... or did.  I have been finishing writings on distance and geometry and I almost forgot this one.  The principle is simple.  Connect the dots, so that no line (edge) overlaps, but only connects at a point.  In practice, you want the tree to produce a minimal length/distance output.  There are a variety of names forms, each with their subtle nuance and application, but my favorite is Prim's... only because I sortof understand it.  I am not sure if this his implementation exactly, but it is close enough.  So I post it here for those that want to try it.  Besides, Prim's can be used to produce mazes, a far more useful application of dot connection.

My favorite code header to cover imports and array printing and some bare-bones graphing

import sys
import numpy as np
import matplotlib.pyplot as plt
from textwrap import dedent, indent

ft = {'bool': lambda x: repr(x.astype('int32')),
      'float': '{: 0.1f}'.format}
np.set_printoptions(edgeitems=10, linewidth=100, precision=2,
                    suppress=True, threshold=120,
                    formatter=ft)
np.ma.masked_print_option.set_display('-')

script = sys.argv[0]

Now the ending of the script where the actual calls are performed and an explanation of what is happening occurs.

# ---------------------------------------------------------------------
if __name__ == "__main__":
    """Main section...   """
    #print("Script... {}".format(script))
    # ---- Take a few points to get you started ----
    a = np.array([[0, 0], [0,8], [10, 8],  [10,0], [3, 4], [7,4]])
    #
    idx= np.lexsort((a[:,1], a[:,0]))  # sort X, then Y
    a_srt = a[idx,:]                   # slice the sorted array
    d = _e_dist(a_srt)                 # determine the square form distances
    pairs = mst(d)                     # get the orig-dest pairs for the mst
    plot_mst(a_srt, pairs)             # a little plot
    o_d = connect(a_srt, d, pairs)     # produce an o-d structured array

Now the rest is just filler.  The code defs are given below.

def _e_dist(a):
    """Return a 2D square-form euclidean distance matrix.  For other 
    :  dimensions, use e_dist in ein_geom.py"""
    b = a.reshape(np.prod(a.shape[:-1]), 1, a.shape[-1])
    diff = a - b
    d = np.sqrt(np.einsum('ijk,ijk->ij', diff, diff)).squeeze()
    #d = np.triu(d)
    return d 


def mst(W, copy_W=True):
    """Determine the minimum spanning tree for a set of points represented
    :  by their inter-point distances... ie their 'W'eights
    :Requires:
    :--------
    :  W - edge weights (distance, time) for a set of points. W needs to be
    :      a square array or a np.triu perhaps
    :Returns:
    :-------
    :  pairs - the pair of nodes that form the edges
    """
    if copy_W:
        W = W.copy() 
    if W.shape[0] != W.shape[1]:
        raise ValueError("W needs to be square matrix of edge weights")
    Np = W.shape[0]
    pairs = []
    pnts_seen = [0]  # Add the first point                    
    n_seen = 1
    # exclude self connections by assigning inf to the diagonal
    diag = np.arange(Np)
    W[diag, diag] = np.inf
    # 
    while n_seen != Np:                                     
        new_edge = np.argmin(W[pnts_seen], axis=None)
        new_edge = divmod(new_edge, Np)
        new_edge = [pnts_seen[new_edge[0]], new_edge[1]]
        pairs.append(new_edge)
        pnts_seen.append(new_edge[1])
        W[pnts_seen, new_edge[1]] = np.inf
        W[new_edge[1], pnts_seen] = np.inf
        n_seen += 1
    return np.vstack(pairs)
 
 
def plot_mst(a, pairs):
    """plot minimum spanning tree test """
    plt.scatter(a[:, 0], a[:, 1])
    ax = plt.axes()
    ax.set_aspect('equal')
    for pair in pairs:
        i, j = pair
        plt.plot([a[i, 0], a[j, 0]], [a[i, 1], a[j, 1]], c='r')
    lbl = np.arange(len(a))
    for label, xpt, ypt in zip(lbl, a[:,0], a[:,1]):
        plt.annotate(label, xy=(xpt, ypt), xytext=(2,2), size=8,
                     textcoords='offset points',
                     ha='left', va='bottom') 
    plt.show()
    plt.close()

def connect(a, dist_arr, edges):
    """Return the full spanning tree, with points, connections and distance
    : a - point array
    : dist - distance array, from _e_dist
    : edge - edges, from mst
    """
    p_f = edges[:, 0]
    p_t = edges[:, 1]
    d = dist_arr[p_f, p_t]
    n = p_f.shape[0]
    dt = [('Orig', '<i4'), ('Dest', 'i4'), ('Dist', '<f8')]
    out = np.zeros((n,), dtype=dt)
    out['Orig'] = p_f
    out['Dest'] = p_t
    out['Dist'] = d
    return out

The output from the sample points is hardly exciting.  But you can see the possibilities for the other set.

This one is for a 100 points, with a minimum spacing of 3 within a 100x100 unit square.  Sprightly solution even on an iThingy using python 3.5 

Now on to maze creation .....

8 Comments
DarrenWiens2
MVP Alum

Thanks for sharing, Dan. It probably doesn't have the performance of numpy, but here's my try at simplifying it in terms of arcpy:

def addToTree(F,Q,sr):
    min_dist = float("inf") # set to infinity
    for fk,fv in F.iteritems(): # loop through tree vertices
        for qk,qv in Q.iteritems(): # loop through non-tree vertices
            dist = fv['G'].distanceTo(qv['G']) # calculate distance
            if dist < min_dist: # if distance is less than current minimum, remember
                fk_fv_qk_qv = [fk,fv['G'],qk,qv['G']]
                min_dist = dist
    F[fk_fv_qk_qv[2]] = {'G': fk_fv_qk_qv[3]} # add to tree vertices
    del Q[fk_fv_qk_qv[2]] # delete from non-tree vertices
    return arcpy.Polyline(arcpy.Array([fk_fv_qk_qv[1].centroid,fk_fv_qk_qv[3].centroid]),sr) # return new line

fc = 'points' # points feature class
sr = arcpy.Describe(fc).spatialReference # spatial ref
Q = {str(i[0]):{'G':i[1]} for i in arcpy.da.SearchCursor(fc,['OID@','SHAPE@'],spatial_reference=sr)} # non-tree vertices
F = {} # empty tree vertices
lines = [] # placeholder for lines
q_cur = Q.keys()[0] # get first non-tree vertex
F[q_cur] = Q[q_cur] # transfer to tree vertices
del Q[q_cur] # delete from non-tree vertices
while len(Q)>0: # do until all non-tree vertices assigned to tree
    output = addToTree(F,Q,sr) # add a vertex via function
    lines.append(output) # remember line
arcpy.CopyFeatures_management(lines,r'in_memory\lines') # write lines
DanPatterson_Retired
MVP Emeritus

great to have options Darren!  Don't forget to check out maze creation

DuncanHornby
MVP Notable Contributor

Hi Darren,

Just had a quick play with your code as one can simply drop it into the python window in ArcMap. I tried it on a dataset with approximate 100 points and it worked a treat. I then chucked a settlement point dataset of the whole of India at it... 

Arcmap went into overdrive and I eventually had to crashed out. In an attempt to understand your code am I right in saying that as it tries to build the tree (I think F in your code?) F gets bigger and Q gets smaller and at some point it's comparing 1000's of nodes with 1000's of nodes which is millions of combinations? So this algorithm is not well suited for big datasets?

I've never actually tried to create a minimum spanning tree, hence very interested in what you had done, but I guess what I did was an unrealistic demand of the algorithm?

Duncan

DarrenWiens2
MVP Alum

Yes, I think that's about right. I tried to use Prim's algorithm, but not 100% sure it's right. At each step, you need to find the next nearest non-tree point, which as you noticed, causes the calculations increase exponentially. The nice thing about numpy is that some smart person came up with a much faster way to do these types of huge combinational calculations. I think Dan made his own version (_e_dist function), but you can also look into the scipy pdist function to make your distance matrices.

DarrenWiens2
MVP Alum

Couldn't resist. Here's where I got to with an approximation of the randomized Prim's algorithm for maze generation. It's very slow, and my walls_list never actually gets to zero, so I'm pretty sure I've done something wrong. But, it does create a maze, after all.

import random

def check_walls(cur_wall, walls_list, cells):
    for k,v in cells.iteritems():
        intersect = cur_wall.intersect(v[0],2)
        if intersect.length > 0 and v[1]==0:
            v[1] = 1
            arcpy.SelectLayerByLocation_management('walls',"SHARE_A_LINE_SEGMENT_WITH",cur_wall,selection_type="NEW_SELECTION")
            arcpy.DeleteFeatures_management('walls')
            arcpy.SelectLayerByLocation_management('walls',"SHARE_A_LINE_SEGMENT_WITH",v[0],selection_type="NEW_SELECTION")
            walls_list.extend([i[0] for i in arcpy.da.SearchCursor('walls','SHAPE@',spatial_reference=sr)])
        for wall in walls_list:
            if wall.equals(cur_wall):
                walls_list.remove(wall)
    return walls_list

fc = 'hex_grid'
sr = arcpy.Describe(fc).spatialReference
cells = {str(i[1]):[i[0],0] for i in arcpy.da.SearchCursor(fc,['SHAPE@','OID@'],spatial_reference=sr)}
cur_cell = cells.keys()[0]
cells[cur_cell][1] = 1
walls = arcpy.Intersect_analysis("hex_grid",r'in_memory\walls',output_type='LINE')
wc = "mod({},2)=0".format(arcpy.AddFieldDelimiters('walls','FID'))
arcpy.SelectLayerByAttribute_management('walls',"NEW_SELECTION",wc)
arcpy.DeleteFeatures_management('walls')
arcpy.SelectLayerByLocation_management('walls',"SHARE_A_LINE_SEGMENT_WITH",cells[cur_cell][0],selection_type="NEW_SELECTION")
walls_list = [i[0] for i in arcpy.da.SearchCursor('walls','SHAPE@',spatial_reference=sr)]
counter = 0
while len(walls_list)>0 and counter < 2000:
    random.shuffle(walls_list)
    cur_wall = walls_list[0]
    counter += 1
    walls_list = check_walls(cur_wall, walls_list, cells)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

DanPatterson_Retired
MVP Emeritus

Ooooo trying to distract me from my courses... I should make this a lab assignment

DanPatterson_Retired
MVP Emeritus

e_dist is homage to Einstein distance since it uses einsum in its core.  I have several blog posts about it.  It allows one to calculate

  • one to many (ie an origin to many destinations)    - scipy's  pdist
  • many to many (within or between point sets )        -scipy's  cdist
  • all to itself (all points within a set to all other pnts) - scipy's square form

Scipy has a habit of requiring a limited number of dimensions, but the e_dist works with 1d, 2d and 3d arrays so you can do 2d or 3d distances within those dimensions.  I compared pdist and e_dist in one of my blogs with a less than optimized version of e_dist and 50,000,000 points was done in under 1 sec ... however pdist and cdist were 40%-50% faster... my conclusion... really? if a half a second makes that much of a difference, you can call many functions or use one.  Plus the core of e_dist can be used for statistical calculations etc.  numpy and scipy are fast and optimized.  Run your own tests

NeilAyres
MVP Alum

Darren,

Have borrowed and been messing with your code to do the mst.

Needed to modify it to include weighted edges to try and produce an optimal fibre network in a suburb.

Not perfect yet, but thanks very much for this method.

About the Author
Retired Geomatics Instructor at Carleton University. I am a forum MVP and Moderator. Current interests focus on python-based integration in GIS. See... Py... blog, my GeoNet blog...
Labels