Creating Clusters/Groups of Neighbors?

GabeKarns · ‎10-11-2013

I'm working with a set of 88 counties in Ohio, and I need to cluster them into groups of 3 based on minimum aggregate distance between each possible cluster of 3 (I created polygon centroids for each county). This will be done without replacement. Once a county is assigned to one cluster, it is not available for others.

Put another way, if there is A, B, C, and D in close proximity to one another, does A/B/C or B/C/D create "tightest" group based off sum of the distances between A-B, B-C, and A-C versus B-C, C-D, and B-D.

I am aware of the Grouping Analysis tool in version 10.1 (not positive that would do the trick...), but unfortunately I do not currently have access to that platform.

Help please! Thanks in advance.

XanderBakker · ‎10-16-2013

I'm working with a set of 88 counties in Ohio, and I need to cluster them into groups of 3 based on minimum aggregate distance between each possible cluster of 3 (I created polygon centroids for each county). This will be done without replacement. Once a county is assigned to one cluster, it is not available for others.

Put another way, if there is A, B, C, and D in close proximity to one another, does A/B/C or B/C/D create "tightest" group based off sum of the distances between A-B, B-C, and A-C versus B-C, C-D, and B-D.

I am aware of the Grouping Analysis tool in version 10.1 (not positive that would do the trick...), but unfortunately I do not currently have access to that platform.

Help please! Thanks in advance.

Hi Gabe,

You are right that Grouping Analysis will not work for your case. Although you can specify the number of groups, the size of the groups will vary.

Since this is a nice challenge I tried something which is far from perfect. It evaluates for the first point which other 2 points are closest. Those 3 points are stored in a dictionary with points (counties) that are no longer available. It takes the next point and so on. So the result below:

[ATTACH=CONFIG]28364[/ATTACH]

There are a few downsides to this method:

This will not result in the best solution since not all possibilities are evaluated. The higher the group number the less counties will still be available and the less optimal the result will be. Just check the result for groups 27 and 29.

Since there are 88 counties, county 88 is not group (group = 99)

As you can see there's not much intelligence in this process. One could test all possibilities (88 * 87 * 86 = 658416) or make it more intelligent to come up with the best result.

Here's the code (stand-alone 10.2) I wrote:

#-------------------------------------------------------------------------------
# Name:        OHIO.py
# Purpose:     group point data OHIO
# Author:      Xander Bakker
# Created:     16-10-2013
#-------------------------------------------------------------------------------

def main():
    print "Loading libs (arcpy)"
    import arcpy
    import math
    print "Libs (arcpy) loaded..."

    fc = r'C:\Project\_Grouping\fgdb\Grouping.gdb\OHpntSP1'

    # I assume that field exists and is of type integer
    fldOut = 'group01'

    maxOID = 0
    dict1 = {}
    fields = ['OID@','SHAPE@X','SHAPE@Y']
    with arcpy.da.SearchCursor(fc, fields) as cursor:
        for row in cursor:
            OID = row[0]
            if OID > maxOID:
                maxOID = OID
            X = row[1]
            Y = row[2]
            XY = "{0}#{1}".format(X,Y)
            if dict1.has_key(OID) == False:
                dict1.update({OID:XY})

    del cursor
    del row
    print "Loop ready, dict filled..."

    # loops
    group = 0
    dictSol = {}
    dictOIDused = {}
    for id1 in range(1,maxOID-2):

        if dictOIDused.has_key(id1) == False:
            minDist = 999999
            minSol = ''

            for id2 in range(id1+1,maxOID-1):
                if dictOIDused.has_key(id2) == False:

                    for id3 in range(id2+1,maxOID):
                        if dictOIDused.has_key(id3) == False:

                            XY1 = dict1[id1]
                            XY2 = dict1[id2]
                            XY3 = dict1[id3]

                            X1 = float(XY1.split('#')[0])
                            Y1 = float(XY1.split('#')[1])
                            X2 = float(XY2.split('#')[0])
                            Y2 = float(XY2.split('#')[1])
                            X3 = float(XY3.split('#')[0])
                            Y3 = float(XY3.split('#')[1])

                            sumDist = getSumDist(X1,Y1,X2,Y2,X3,Y3)
                            if sumDist < minDist:
                                minSol = '{0}#{1}#{2}'.format(id1,id2,id3)
                                minDist = sumDist

                        else:
                            # skip, id3 already in use
                            pass

                else:
                    # skip, id2 already in use
                    pass

            # end of loop through id2, so solution is available
            group += 1
            dictOIDused.update({int(minSol.split('#')[0]): group})
            dictOIDused.update({int(minSol.split('#')[1]): group})
            dictOIDused.update({int(minSol.split('#')[2]): group})
            # print "minSol={0} and minDist={1}".format(minSol,minDist)


        else:
            # skip, id1 already in use
            pass

    # update cursor
    fields = ['OID@',fldOut]
    with arcpy.da.UpdateCursor(fc, fields) as cursor:
        for row in cursor:
            id = row[0]
            if dictOIDused.has_key(id):
                group = dictOIDused[id]
            else:
                group = 99
            row[1] = int(group)
            cursor.updateRow(row)

    del row
    del cursor
    print "ready..."



def getSumDist(X1,Y1,X2,Y2,X3,Y3):
    import math
    d1 = math.hypot(X2 - X1, Y2 - Y1)
    d2 = math.hypot(X3 - X1, Y3 - Y1)
    d3 = math.hypot(X3 - X2, Y3 - Y2)
    return d1 + d2 + d3



if __name__ == '__main__':
    main()