Logarithmic Averaging / Geometric Mean

12254
12
10-04-2010 10:33 AM
KoreenMillard
New Contributor II
Hello, I want to calculate the logarithmic average of the values of points as a raster. 

Has anyone ever tried to do this? 

-k

FYI:

Geometric mean explained by wikipedia:  "The geometric mean, in mathematics, is a type of mean or average, which indicates the central tendency or typical value of a set of numbers. It is similar to the arithmetic mean, which is what most people think of with the word "average", except that the numbers are multiplied and then the nth root (where n is the count of numbers in the set) of the resulting product is taken."

http://en.wikipedia.org/wiki/Geometric_mean
12 Replies
MichaelAugust
Occasional Contributor III

Great - thanks Shaun!

0 Kudos
curtvprice
MVP Esteemed Contributor

Once upon a time there was a tool in ArcInfo Workstation GRID called DOCELL that would operate through all the cells of a raster. It was cool:

Grid: g = scalar(1)

Grid: n = scalar(0)

Grid: DOCELL

:: g *= ingrid

:: n += 1

::  END

Grid: g ^ (1 / float(n))

Talk about a blast from the past.

Shaun Walbridge‌ 's suggestion like a good way to do it nowadays. But if you can't get scipy installed (a bit of a lift actually) you could do something similar by converting the raster to a numpy array with RasterToNumpyArray() and doing the calculation there. That's what Dan did.

I just had a another thought - if the input is an integer grid with a VAT, something like this would work:

sum = 0

ncells = 0

with arcpy.da.SearchCursor("ingrid", ["VALUE", "COUNT"]) as rows:

    for row in rows:

        for cc in range(row[1]):

            sum += sum * row[0] # running geometric sum of all cells in zone

        ncells += row[1]        # running cell count

gmean = sum ** (1 / float(ncells))

It's possible you may run into nasty numerical problems if the grid is very big as the sum gets huge, but this avoids converting the raster to numpy.

0 Kudos
DanPatterson_Retired
MVP Emeritus

This is an alternate example of Curtis's example, not specifically what the OP wants. ...In the example below I created 3 numpy arrays representing your example, assuming everything under the root sign is a grid.


>>> import numpy as np


>>> base_form = np.arange(0,9,dtype='float64')


>>> base_form.reshape((3,3))


array([[ 0.,  1.,  2.],


      [ 3.,  4.,  5.],


      [ 6.,  7.,  8.]])


>>> a = ((np.ones_like(base_form))*2).reshape((3,3))


>>> b = ((np.ones_like(base_form))*4).reshape((3,3))


>>> c = ((np.ones_like(base_form))*8).reshape((3,3))


>>>


>>> d = a*b*c


>>> d


array([[ 64.,  64.,  64.],


      [ 64.,  64.,  64.],


      [ 64.,  64.,  64.]])


>>> root = ((np.ones_like(base_form))*(1.0/3.0)).reshape((3,3))


>>> root


array([[ 0.33333333,  0.33333333,  0.33333333],


      [ 0.33333333,  0.33333333,  0.33333333],


      [ 0.33333333,  0.33333333,  0.33333333]])


>>> gm = d**root


>>> gm


array([[ 4.,  4.,  4.],


      [ 4.,  4.,  4.],


      [ 4.,  4.,  4.]])


>>>




it might be worth considering since you can use Raster to Numpy Array and the reverse to go between numpy and Arc

Now on to the simpler question for the gm of a single raster.  You need the cumulative product of each cell, getting the last value in the row column which you do with a np.flatten function if it is a n*m array.  For a simple array. the syntax would be


>>> # different example
>>> a = (np.arange(1,10)).reshape((3,3))
>>> a
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> cp = np.cumproduct(a)
>>> cp
array([     1,      2,      6,     24,    120,    720,   5040,  40320,
       362880])
>>> N = np.size(cp)
>>> N
9
>>> gm = (cp[-1])**(1.0/N)
>>> gm
4.1471662743969127
>>> 



Which is the same as excel...for your example,


on just those numbers....


>>> a = np.array([2.,4.,8.])


>>> cp = np.cumprod(a)


>>> cp


array([  2.,   8.,  64.])


>>> gm = cp[-1]**(1.0/np.size(cp))


>>> gm


3.9999999999999996


>>>

You can also use the alternate formulation to calculate the geometric mean for large numbers and/or arrays to prevent numeric over/under flows.  For example, a case of the 3x3 array as before and 1,000,000 int32 integers between 1 and 255


>>> import numpy as np


>>> import random


>>> #************************ Simple example using log10  **************************


>>> a = (np.arange(1,10)).reshape((3,3))


>>> loga = np.log10(a)


>>> loga


array([[ 0.        ,  0.30103   ,  0.47712125],


       [ 0.60205999,  0.69897   ,  0.77815125],


       [ 0.84509804,  0.90308999,  0.95424251]])


>>> cum_sum = np.cumsum(loga)


>>> N = np.size(a)


>>> N


9


>>> gm = 10.0**((cum_sum[-1])/N)


>>> gm


4.1471662743969135


>>> import numpy as np


>>> import random


>>> #***************************  large array * **************


>>> big = [ random.randint(1,255) for i in xrange(1000000)]


>>> a = np.array(big,dtype='int64')


>>> loga = np.log10(a)


>>> loga


array([ 2.32221929,  1.95904139,  2.38201704, ...,  2.19865709,


        1.68124124,  2.25285303])


>>> N = len(big)


>>> N


1000000


>>> cum_sum = np.cumsum(loga)


>>> cum_sum


array([  2.32221929e+00,   4.28126069e+00,   6.66327773e+00, ...,


         1.97970102e+06,   1.97970270e+06,   1.97970495e+06])


>>> gm = 10.0**((cum_sum[-1])/N)


>>> gm


95.434401572885349


>>>

An no other packages need to be installed

0 Kudos