Hello,
Does anyone know of a way of establishing nested means classification using geoprocessing? I wouldn't care if it were implemented to work against points, polygons or a raster - any of the three would be fine. I can work with ArcGIS Desktop, ArcGIS Pro or QGIS. Of course the classification groups would be written to a field.
Thank you for any help.
An interesting question finally
Load up your favorite IDE (like Spyder  )
 )
Load, edit inputs and run
Lines 24, 25 and 26....
import numpy as np
import arcpy
def mean_split(a, minSize=3):
    """split at means"""
    def slice_array(a):
        m = np.mean(a)
        yes = a <= m  # check for a less than the overal mean
        a_left, a_rght = a[~yes], a[yes]  # slice the arrays
        return m, a_left, a_rght
    # ----
    m, L, R = slice_array(a)
    m0, L, _ = slice_array(L)
    m1, _, R = slice_array(R)
    means = [m0, m, m1]
    while (len(L) > minSize) and (len(R) > minSize):
        m0, L, _ = slice_array(L)
        m1, _, R = slice_array(R)
        means.extend([m0, m1])
    return sorted(means)
# ----- Inputs here -----------------------------------------------
tbl = r"C:\Arc_projects\Table_tools\Table_tools.gdb\pnts_2K_normal"
in_fld = "Norm"
out_fld = "New_class"
# ---- Do some work -----------------------------------------------
# (1)  Get the field from the table and make it a simple array
arr = arcpy.da.TableToNumPyArray(tbl, ['OID@', in_fld], "", True, None)
a = arr[in_fld]
# (2)  Set up for the results
out_arr = np.copy(arr)
out_arr.dtype.names = ['OID@', out_fld]
# (3)  Run the mean_split script... note minSize!!! set it smartly or...
means = mean_split(a, minSize=100)
classed = np.digitize(a, bins=means)
# (4)  Send the results to the output array and add it back to arcgis pro
out_arr[out_fld] = classed
arcpy.da.ExtendTable(tbl, 'OID@', out_arr, 'OID@')
The work
NOTE
no error checking provided...
Finally
Through arcpy/numpy magic, 'JOIN' the table back to the original
Open Pro
Results example
np.mean(a) # ==> 10.56
np.std(a)  # ==>  1.04
means = np.array(means)
means
array([ 8.583,  9.117,  9.736, 10.557, 11.383, 11.978, 12.439])
a.min(), a.max()  # ==>  (6.97, 13.97)
np.histogram(a, bins=means)
(array([100, 256, 587, 577, 249,  92], dtype=int64), array([ 8.583,  9.117,  9.736, 10.557, 11.383, 11.978, 12.439]))Samples drawn from a normal distribution.
Variants of my suggestion can obviously modified to suit
Added this to Table tools for ArcGIS Pro
for those that prefer a tool interface
