Nested means classification

1136
2
05-01-2019 09:40 AM
PaulLohr
Occasional Contributor III

Hello,

Does anyone know of a way of establishing nested means classification using geoprocessing? I wouldn't care if it were implemented to work against points, polygons or a raster - any of the three would be fine. I can work with ArcGIS Desktop, ArcGIS Pro or QGIS. Of course the classification groups would be written to a field.

Thank you for any help.

0 Kudos
2 Replies
DanPatterson_Retired
MVP Emeritus

An interesting question finally

Load up your favorite IDE (like Spyder )

Load, edit inputs and run

Lines 24, 25 and 26....

  • your space-less path to your table in your file geodatabase 
  • the field containing the data your want to determine the nested means on
  • The output field, which will be magically created... for the results.  DO NOT create it ahead of time
  • shut down ArcGIS Pro, on the off chance that you have 'touched' something to create file locks

import numpy as np
import arcpy

def mean_split(a, minSize=3):
    """split at means"""
    def slice_array(a):
        m = np.mean(a)
        yes = a <= m  # check for a less than the overal mean
        a_left, a_rght = a[~yes], a[yes]  # slice the arrays
        return m, a_left, a_rght
    # ----
    m, L, R = slice_array(a)
    m0, L, _ = slice_array(L)
    m1, _, R = slice_array(R)
    means = [m0, m, m1]
    while (len(L) > minSize) and (len(R) > minSize):
        m0, L, _ = slice_array(L)
        m1, _, R = slice_array(R)
        means.extend([m0, m1])
    return sorted(means)

# ----- Inputs here -----------------------------------------------

tbl = r"C:\Arc_projects\Table_tools\Table_tools.gdb\pnts_2K_normal"
in_fld = "Norm"
out_fld = "New_class"

# ---- Do some work -----------------------------------------------

# (1)  Get the field from the table and make it a simple array
arr = arcpy.da.TableToNumPyArray(tbl, ['OID@', in_fld], "", True, None)
a = arr[in_fld]

# (2)  Set up for the results
out_arr = np.copy(arr)
out_arr.dtype.names = ['OID@', out_fld]

# (3)  Run the mean_split script... note minSize!!! set it smartly or...
means = mean_split(a, minSize=100)
classed = np.digitize(a, bins=means)

# (4)  Send the results to the output array and add it back to arcgis pro
out_arr[out_fld] = classed
arcpy.da.ExtendTable(tbl, 'OID@', out_arr, 'OID@')
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The work

  • Lines 31, 32,  arcpy/numpy play nice and suck in the table using only the objectid and your desired field (see help for more info)
  • little 'a' is just the full array without the object id
  • Lines 35, 36, set up for the outputs
  • Lines 39, 40 ... run mean_split .... note!!!! minSize is the approximate size of each class.  I used 100 because the field had 2000 records.  

NOTE

no error checking provided... 

Finally

Through arcpy/numpy magic, 'JOIN' the table back to the original

Open Pro

Results example

np.mean(a) # ==> 10.56
np.std(a)  # ==>  1.04
means = np.array(means)

means
array([ 8.583,  9.117,  9.736, 10.557, 11.383, 11.978, 12.439])

a.min(), a.max()  # ==>  (6.97, 13.97)

np.histogram(a, bins=means)

(array([100, 256, 587, 577, 249,  92], dtype=int64), array([ 8.583,  9.117,  9.736, 10.557, 11.383, 11.978, 12.439]))‍‍‍‍‍‍‍‍‍‍

Samples drawn from a normal distribution. 

Variants of my suggestion can obviously modified to suit  

0 Kudos
DanPatterson_Retired
MVP Emeritus

Added this to  Table tools for ArcGIS Pro 

for those that prefer a tool interface

0 Kudos