Pool of raster values to calculate percentile

anTonialcaraz · ‎11-06-2015

Hi there,

I've been calculating percentiles out of monthly raster values, that is one month/raster at a time. I've been using the RasterToNumPyArray function and all good. Like this:

array = arcpy.RasterToNumPyArray((raster),nodata_to_value=10000000000)

marray = numpy.ma.masked_values(array,10000000000)

per = 20

percentile_val = numpy.percentile(marray.compressed(),(per))

print percentile_val

Now, I need to do the same but taking all 12 months into the percentile calculation. That is, using all cell values from all 12 rasters to calculate the percentile. Like putting all values from the 12 raster months into a single pool of values. I tried this:

array = arcpy.RasterToNumPyArray([(raster1), (raster2), (raster3), (raster4), (raster5), (raster6), (raster7),
         (raster8), (raster9), (raster10), (raster11), (raster12)],nodata_to_value=10000000000)

marray = numpy.ma.masked_values(array,10000000000)

per = 20

percentile_val = numpy.percentile(marray.compressed(),(per))

print percentile_val

But I think the RasterToNumPyArray function does not allow multiple rasters.

I wonder if someone could give me some tips to overcome this issue.

Many thanks in advance

Tony

LukeSturtevant · ‎11-06-2015

This is a very helpful code thanks Dan and Toni. Not that it is needed Toni, but if all your raster are in a single workspace and you would like to compress your code you could have something like this:

import arcpy,numpy
from arcpy import env

env.workspace = "C:\\RasterFolder"


rasters = [arcpy.RasterToNumPyArray(arcpy.Describe(ras).catalogPath,nodata_to_value=10000000000) for ras in arcpy.ListRasters()]
  
marray = numpy.ma.masked_values(rasters,10000000000)  
  
per = 20  
  
percentile_val = numpy.percentile(marray.compressed(),(per))  
  
print percentile_val

You could use a wild card to filter raster names and folders as well for the ListRasters function.

View solution in original post

DanPatterson_Retired · ‎11-06-2015

This way is easiest...just get your arrays in, I have created some for demo purposes

>>> a = np.zeros(9,dtype=int).reshape((3,3))
>>> b = np.ones_like(a)          # all ones/True
>>> c = b*2
>>> d = b*3
>>> stack = np.array([a,b,c,d])
>>> stack
array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],
      [[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]],
      [[2, 2, 2],
        [2, 2, 2],
        [2, 2, 2]],
      [[3, 3, 3],
        [3, 3, 3],
        [3, 3, 3]]])
>>> per = 20
>>> np.percentile(stack,per,axis=0)
array([[ 0.6,  0.6,  0.6],
      [ 0.6,  0.6,  0.6],
      [ 0.6,  0.6,  0.6]])
>>> per = 50
>>> np.percentile(stack,per,axis=0)
array([[ 1.5,  1.5,  1.5],
      [ 1.5,  1.5,  1.5],
      [ 1.5,  1.5,  1.5]])
>>>

You will only be limited by memory as to whether you can read all the rasters in as arrays all at once.

anTonialcaraz · ‎11-06-2015

Thanks Dan. I guess when it comes to rasters it should be something like this? It seems to be working.

raster1 = Raster(env.workspace + "\\" + "Maas_hvel_jan")
raster2 = Raster(env.workspace + "\\" + "Maas_hvel_feb")
raster3 = Raster(env.workspace + "\\" + "Maas_hvel_mar")
raster4 = Raster(env.workspace + "\\" + "Maas_hvel_apr")
raster5 = Raster(env.workspace + "\\" + "Maas_hvel_may")
raster6 = Raster(env.workspace + "\\" + "Maas_hvel_jun")
raster7 = Raster(env.workspace + "\\" + "Maas_hvel_jul")
raster8 = Raster(env.workspace + "\\" + "Maas_hvel_aug")
raster9 = Raster(env.workspace + "\\" + "Maas_hvel_sep")
raster10 = Raster(env.workspace + "\\" + "Maas_hvel_oct")
raster11 = Raster(env.workspace + "\\" + "Maas_hvel_nov")
raster12 = Raster(env.workspace + "\\" + "Maas_hvel_dec")

array1 = arcpy.RasterToNumPyArray(raster1,nodata_to_value=10000000000)
array2 = arcpy.RasterToNumPyArray(raster2,nodata_to_value=10000000000)
array3 = arcpy.RasterToNumPyArray(raster3,nodata_to_value=10000000000)
array4 = arcpy.RasterToNumPyArray(raster4,nodata_to_value=10000000000)
array5 = arcpy.RasterToNumPyArray(raster5,nodata_to_value=10000000000)
array6 = arcpy.RasterToNumPyArray(raster6,nodata_to_value=10000000000)
array7 = arcpy.RasterToNumPyArray(raster7,nodata_to_value=10000000000)
array8 = arcpy.RasterToNumPyArray(raster8,nodata_to_value=10000000000)
array9 = arcpy.RasterToNumPyArray(raster9,nodata_to_value=10000000000)
array10 = arcpy.RasterToNumPyArray(raster10,nodata_to_value=10000000000)
array11 = arcpy.RasterToNumPyArray(raster11,nodata_to_value=10000000000)
array12 = arcpy.RasterToNumPyArray(raster12,nodata_to_value=10000000000)

marray = numpy.ma.masked_values([(array1), (array2), (array3), (array4), (array5), (array6), (array7),(array8), (array9), (array10), (array11), (array12)],10000000000)

per = 20

percentile_val = numpy.percentile(marray.compressed(),(per))

print percentile_val

LukeSturtevant · ‎11-06-2015

This is a very helpful code thanks Dan and Toni. Not that it is needed Toni, but if all your raster are in a single workspace and you would like to compress your code you could have something like this:

import arcpy,numpy
from arcpy import env

env.workspace = "C:\\RasterFolder"


rasters = [arcpy.RasterToNumPyArray(arcpy.Describe(ras).catalogPath,nodata_to_value=10000000000) for ras in arcpy.ListRasters()]
  
marray = numpy.ma.masked_values(rasters,10000000000)  
  
per = 20  
  
percentile_val = numpy.percentile(marray.compressed(),(per))  
  
print percentile_val

You could use a wild card to filter raster names and folders as well for the ListRasters function.

anTonialcaraz · ‎11-06-2015

That was very useful Luke. Thanks a lot.

DanPatterson_Retired · ‎11-06-2015

Ooopps

Toni... before you get to far you are using the wrong percentile if you have nodata areas

use.,...
>>> help(np.nanpercentile)
Help on function nanpercentile in module numpy.lib.nanfunctions:

nanpercentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
    Compute the qth percentile of the data along the specified axis, while
    ignoring nan values.

    Returns the qth percentile of the array elements.

anTonialcaraz · ‎11-09-2015

Thanks Dan. It actually gives the same results within the script so I think it should be fine.

Cheers

DanPatterson_Retired · ‎11-09-2015

Ok good. You can skip the masked part as well if there are no, nodata areas then. I does make a difference in processing time should you be working with large arrays and/or many arrays. Should others be interested, there are nan equivalents for most of the statistical functions ( eg. np.mean --> np.nanmean)