Nested means classification

PaulLohr · ‎05-01-2019

Hello,

Does anyone know of a way of establishing nested means classification using geoprocessing? I wouldn't care if it were implemented to work against points, polygons or a raster - any of the three would be fine. I can work with ArcGIS Desktop, ArcGIS Pro or QGIS. Of course the classification groups would be written to a field.

Thank you for any help.

Paul Lohr
Lohr Geospatial Services, Inc.

DanPatterson_Retired · ‎05-01-2019

An interesting question finally

Load up your favorite IDE (like Spyder )

Load, edit inputs and run

Lines 24, 25 and 26....

your space-less path to your table in your file geodatabase
the field containing the data your want to determine the nested means on
The output field, which will be magically created... for the results. DO NOT create it ahead of time
shut down ArcGIS Pro, on the off chance that you have 'touched' something to create file locks

import numpy as np
import arcpy

def mean_split(a, minSize=3):
    """split at means"""
    def slice_array(a):
        m = np.mean(a)
        yes = a <= m  # check for a less than the overal mean
        a_left, a_rght = a[~yes], a[yes]  # slice the arrays
        return m, a_left, a_rght
    # ----
    m, L, R = slice_array(a)
    m0, L, _ = slice_array(L)
    m1, _, R = slice_array(R)
    means = [m0, m, m1]
    while (len(L) > minSize) and (len(R) > minSize):
        m0, L, _ = slice_array(L)
        m1, _, R = slice_array(R)
        means.extend([m0, m1])
    return sorted(means)

# ----- Inputs here -----------------------------------------------

tbl = r"C:\Arc_projects\Table_tools\Table_tools.gdb\pnts_2K_normal"
in_fld = "Norm"
out_fld = "New_class"

# ---- Do some work -----------------------------------------------

# (1)  Get the field from the table and make it a simple array
arr = arcpy.da.TableToNumPyArray(tbl, ['OID@', in_fld], "", True, None)
a = arr[in_fld]

# (2)  Set up for the results
out_arr = np.copy(arr)
out_arr.dtype.names = ['OID@', out_fld]

# (3)  Run the mean_split script... note minSize!!! set it smartly or...
means = mean_split(a, minSize=100)
classed = np.digitize(a, bins=means)

# (4)  Send the results to the output array and add it back to arcgis pro
out_arr[out_fld] = classed
arcpy.da.ExtendTable(tbl, 'OID@', out_arr, 'OID@')
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The work

Lines 31, 32, arcpy/numpy play nice and suck in the table using only the objectid and your desired field (see help for more info)
little 'a' is just the full array without the object id
Lines 35, 36, set up for the outputs
Lines 39, 40 ... run mean_split .... note!!!! minSize is the approximate size of each class. I used 100 because the field had 2000 records.

NOTE

no error checking provided...

Finally

Through arcpy/numpy magic, 'JOIN' the table back to the original

Open Pro

Results example

np.mean(a) # ==> 10.56
np.std(a)  # ==>  1.04
means = np.array(means)

means
array([ 8.583,  9.117,  9.736, 10.557, 11.383, 11.978, 12.439])

a.min(), a.max()  # ==>  (6.97, 13.97)

np.histogram(a, bins=means)

(array([100, 256, 587, 577, 249,  92], dtype=int64), array([ 8.583,  9.117,  9.736, 10.557, 11.383, 11.978, 12.439]))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Samples drawn from a normal distribution.

Variants of my suggestion can obviously modified to suit

DanPatterson_Retired · ‎05-04-2019

Added this to Table tools for ArcGIS Pro

for those that prefer a tool interface