# Creating Clusters/Groups of Neighbors?

2373
1
10-11-2013 11:17 AM
New Contributor
I'm working with a set of 88 counties in Ohio, and I need to cluster them into groups of 3 based on minimum aggregate distance between each possible cluster of 3 (I created polygon centroids for each county).  This will be done without replacement.  Once a county is assigned to one cluster, it is not available for others.

Put another way, if there is A, B, C, and D in close proximity to one another, does A/B/C or B/C/D create "tightest" group based off sum of the distances between A-B, B-C, and A-C versus B-C, C-D, and B-D.

I am aware of the Grouping Analysis tool in version 10.1 (not positive that would do the trick...), but unfortunately I do not currently have access to that platform.

Tags (2)
Esri Esteemed Contributor
I'm working with a set of 88 counties in Ohio, and I need to cluster them into groups of 3 based on minimum aggregate distance between each possible cluster of 3 (I created polygon centroids for each county).  This will be done without replacement.  Once a county is assigned to one cluster, it is not available for others.

Put another way, if there is A, B, C, and D in close proximity to one another, does A/B/C or B/C/D create "tightest" group based off sum of the distances between A-B, B-C, and A-C versus B-C, C-D, and B-D.

I am aware of the Grouping Analysis tool in version 10.1 (not positive that would do the trick...), but unfortunately I do not currently have access to that platform.

Hi Gabe,

You are right that Grouping Analysis will not work for your case. Although you can specify the number of groups, the size of the groups will vary.

Since this is a nice challenge I tried something which is far from perfect. It evaluates for the first point which other 2 points are closest. Those 3 points are stored in a dictionary with points (counties) that are no longer available. It takes the next point and so on. So the result below:

[ATTACH=CONFIG]28364[/ATTACH]

There are a few downsides to this method:

• This will not result in the best solution since not all possibilities are evaluated. The higher the group number the less counties will still be available and the less optimal the result will be. Just check the result for groups 27 and 29.

• Since there are 88 counties, county 88 is not group (group = 99)

As you can see there's not much intelligence in this process. One could test all possibilities (88 * 87 * 86 = 658416) or make it more intelligent to come up with the best result.

Here's the code (stand-alone 10.2) I wrote:

```#-------------------------------------------------------------------------------
# Name:        OHIO.py
# Purpose:     group point data OHIO
# Author:      Xander Bakker
# Created:     16-10-2013
#-------------------------------------------------------------------------------

def main():
import arcpy
import math

fc = r'C:\Project\_Grouping\fgdb\Grouping.gdb\OHpntSP1'

# I assume that field exists and is of type integer
fldOut = 'group01'

maxOID = 0
dict1 = {}
fields = ['OID@','SHAPE@X','SHAPE@Y']
with arcpy.da.SearchCursor(fc, fields) as cursor:
for row in cursor:
OID = row[0]
if OID > maxOID:
maxOID = OID
X = row[1]
Y = row[2]
XY = "{0}#{1}".format(X,Y)
if dict1.has_key(OID) == False:
dict1.update({OID:XY})

del cursor
del row

# loops
group = 0
dictSol = {}
dictOIDused = {}
for id1 in range(1,maxOID-2):

if dictOIDused.has_key(id1) == False:
minDist = 999999
minSol = ''

for id2 in range(id1+1,maxOID-1):
if dictOIDused.has_key(id2) == False:

for id3 in range(id2+1,maxOID):
if dictOIDused.has_key(id3) == False:

XY1 = dict1[id1]
XY2 = dict1[id2]
XY3 = dict1[id3]

X1 = float(XY1.split('#')[0])
Y1 = float(XY1.split('#')[1])
X2 = float(XY2.split('#')[0])
Y2 = float(XY2.split('#')[1])
X3 = float(XY3.split('#')[0])
Y3 = float(XY3.split('#')[1])

sumDist = getSumDist(X1,Y1,X2,Y2,X3,Y3)
if sumDist < minDist:
minSol = '{0}#{1}#{2}'.format(id1,id2,id3)
minDist = sumDist

else:
# skip, id3 already in use
pass

else:
# skip, id2 already in use
pass

# end of loop through id2, so solution is available
group += 1
dictOIDused.update({int(minSol.split('#')[0]): group})
dictOIDused.update({int(minSol.split('#')[1]): group})
dictOIDused.update({int(minSol.split('#')[2]): group})
# print "minSol={0} and minDist={1}".format(minSol,minDist)

else:
# skip, id1 already in use
pass

# update cursor
fields = ['OID@',fldOut]
with arcpy.da.UpdateCursor(fc, fields) as cursor:
for row in cursor:
id = row[0]
if dictOIDused.has_key(id):
group = dictOIDused[id]
else:
group = 99
row[1] = int(group)
cursor.updateRow(row)

del row
del cursor

def getSumDist(X1,Y1,X2,Y2,X3,Y3):
import math
d1 = math.hypot(X2 - X1, Y2 - Y1)
d2 = math.hypot(X3 - X1, Y3 - Y1)
d3 = math.hypot(X3 - X2, Y3 - Y2)
return d1 + d2 + d3

if __name__ == '__main__':
main()```