Hello,
Does anyone know of a way of establishing nested means classification using geoprocessing? I wouldn't care if it were implemented to work against points, polygons or a raster - any of the three would be fine. I can work with ArcGIS Desktop, ArcGIS Pro or QGIS. Of course the classification groups would be written to a field.
Thank you for any help.
An interesting question finally
Load up your favorite IDE (like Spyder )
Load, edit inputs and run
Lines 24, 25 and 26....
import numpy as np
import arcpy
def mean_split(a, minSize=3):
"""split at means"""
def slice_array(a):
m = np.mean(a)
yes = a <= m # check for a less than the overal mean
a_left, a_rght = a[~yes], a[yes] # slice the arrays
return m, a_left, a_rght
# ----
m, L, R = slice_array(a)
m0, L, _ = slice_array(L)
m1, _, R = slice_array(R)
means = [m0, m, m1]
while (len(L) > minSize) and (len(R) > minSize):
m0, L, _ = slice_array(L)
m1, _, R = slice_array(R)
means.extend([m0, m1])
return sorted(means)
# ----- Inputs here -----------------------------------------------
tbl = r"C:\Arc_projects\Table_tools\Table_tools.gdb\pnts_2K_normal"
in_fld = "Norm"
out_fld = "New_class"
# ---- Do some work -----------------------------------------------
# (1) Get the field from the table and make it a simple array
arr = arcpy.da.TableToNumPyArray(tbl, ['OID@', in_fld], "", True, None)
a = arr[in_fld]
# (2) Set up for the results
out_arr = np.copy(arr)
out_arr.dtype.names = ['OID@', out_fld]
# (3) Run the mean_split script... note minSize!!! set it smartly or...
means = mean_split(a, minSize=100)
classed = np.digitize(a, bins=means)
# (4) Send the results to the output array and add it back to arcgis pro
out_arr[out_fld] = classed
arcpy.da.ExtendTable(tbl, 'OID@', out_arr, 'OID@')
The work
NOTE
no error checking provided...
Finally
Through arcpy/numpy magic, 'JOIN' the table back to the original
Open Pro
Results example
np.mean(a) # ==> 10.56
np.std(a) # ==> 1.04
means = np.array(means)
means
array([ 8.583, 9.117, 9.736, 10.557, 11.383, 11.978, 12.439])
a.min(), a.max() # ==> (6.97, 13.97)
np.histogram(a, bins=means)
(array([100, 256, 587, 577, 249, 92], dtype=int64), array([ 8.583, 9.117, 9.736, 10.557, 11.383, 11.978, 12.439]))
Samples drawn from a normal distribution.
Variants of my suggestion can obviously modified to suit
Added this to Table tools for ArcGIS Pro
for those that prefer a tool interface