Pool of raster values to calculate percentile

4026
7
Jump to solution
11-06-2015 03:36 AM
anTonialcaraz
Occasional Contributor II

Hi there,

I've been calculating percentiles out of monthly raster values, that is one month/raster at a time. I've been using the RasterToNumPyArray function and all good. Like this:

array = arcpy.RasterToNumPyArray((raster),nodata_to_value=10000000000)

marray = numpy.ma.masked_values(array,10000000000)

per = 20

percentile_val = numpy.percentile(marray.compressed(),(per))

print percentile_val

Now, I need to do the same but taking all 12 months into the percentile calculation. That is, using all cell values from all 12 rasters to calculate the percentile. Like putting all values from the 12 raster months into a single pool of values. I tried this:

array = arcpy.RasterToNumPyArray([(raster1), (raster2), (raster3), (raster4), (raster5), (raster6), (raster7),
         (raster8), (raster9), (raster10), (raster11), (raster12)],nodata_to_value=10000000000)

marray = numpy.ma.masked_values(array,10000000000)

per = 20

percentile_val = numpy.percentile(marray.compressed(),(per))

print percentile_val

But I think the RasterToNumPyArray function does not allow multiple rasters.

I wonder if someone could give me some tips to overcome this issue.

Many thanks in advance

Tony

0 Kudos
1 Solution

Accepted Solutions
LukeSturtevant
Occasional Contributor III

This is a very helpful code thanks Dan and Toni. Not that it is needed Toni, but if all your raster are in a single workspace and you would like to compress your code you could have something like this:

import arcpy,numpy
from arcpy import env

env.workspace = "C:\\RasterFolder"


rasters = [arcpy.RasterToNumPyArray(arcpy.Describe(ras).catalogPath,nodata_to_value=10000000000) for ras in arcpy.ListRasters()]
  
marray = numpy.ma.masked_values(rasters,10000000000)  
  
per = 20  
  
percentile_val = numpy.percentile(marray.compressed(),(per))  
  
print percentile_val 

You could use a wild card to filter raster names and folders as well for the ListRasters function.

View solution in original post

0 Kudos
7 Replies
DanPatterson_Retired
MVP Emeritus

This way is easiest...just get your arrays in, I have created some for demo purposes

>>> a = np.zeros(9,dtype=int).reshape((3,3))
>>> b = np.ones_like(a)          # all ones/True
>>> c = b*2
>>> d = b*3
>>> stack = np.array([a,b,c,d])
>>> stack
array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],
      [[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]],
      [[2, 2, 2],
        [2, 2, 2],
        [2, 2, 2]],
      [[3, 3, 3],
        [3, 3, 3],
        [3, 3, 3]]])
>>> per = 20
>>> np.percentile(stack,per,axis=0)
array([[ 0.6,  0.6,  0.6],
      [ 0.6,  0.6,  0.6],
      [ 0.6,  0.6,  0.6]])
>>> per = 50
>>> np.percentile(stack,per,axis=0)
array([[ 1.5,  1.5,  1.5],
      [ 1.5,  1.5,  1.5],
      [ 1.5,  1.5,  1.5]])
>>>

You will only be limited by memory as to whether you can read all the rasters in as arrays all at once.

anTonialcaraz
Occasional Contributor II

Thanks Dan. I guess when it comes to rasters it should be something like this? It seems to be working.

raster1 = Raster(env.workspace + "\\" + "Maas_hvel_jan")
raster2 = Raster(env.workspace + "\\" + "Maas_hvel_feb")
raster3 = Raster(env.workspace + "\\" + "Maas_hvel_mar")
raster4 = Raster(env.workspace + "\\" + "Maas_hvel_apr")
raster5 = Raster(env.workspace + "\\" + "Maas_hvel_may")
raster6 = Raster(env.workspace + "\\" + "Maas_hvel_jun")
raster7 = Raster(env.workspace + "\\" + "Maas_hvel_jul")
raster8 = Raster(env.workspace + "\\" + "Maas_hvel_aug")
raster9 = Raster(env.workspace + "\\" + "Maas_hvel_sep")
raster10 = Raster(env.workspace + "\\" + "Maas_hvel_oct")
raster11 = Raster(env.workspace + "\\" + "Maas_hvel_nov")
raster12 = Raster(env.workspace + "\\" + "Maas_hvel_dec")

array1 = arcpy.RasterToNumPyArray(raster1,nodata_to_value=10000000000)
array2 = arcpy.RasterToNumPyArray(raster2,nodata_to_value=10000000000)
array3 = arcpy.RasterToNumPyArray(raster3,nodata_to_value=10000000000)
array4 = arcpy.RasterToNumPyArray(raster4,nodata_to_value=10000000000)
array5 = arcpy.RasterToNumPyArray(raster5,nodata_to_value=10000000000)
array6 = arcpy.RasterToNumPyArray(raster6,nodata_to_value=10000000000)
array7 = arcpy.RasterToNumPyArray(raster7,nodata_to_value=10000000000)
array8 = arcpy.RasterToNumPyArray(raster8,nodata_to_value=10000000000)
array9 = arcpy.RasterToNumPyArray(raster9,nodata_to_value=10000000000)
array10 = arcpy.RasterToNumPyArray(raster10,nodata_to_value=10000000000)
array11 = arcpy.RasterToNumPyArray(raster11,nodata_to_value=10000000000)
array12 = arcpy.RasterToNumPyArray(raster12,nodata_to_value=10000000000)

marray = numpy.ma.masked_values([(array1), (array2), (array3), (array4), (array5), (array6), (array7),(array8), (array9), (array10), (array11), (array12)],10000000000)

per = 20

percentile_val = numpy.percentile(marray.compressed(),(per))

print percentile_val
LukeSturtevant
Occasional Contributor III

This is a very helpful code thanks Dan and Toni. Not that it is needed Toni, but if all your raster are in a single workspace and you would like to compress your code you could have something like this:

import arcpy,numpy
from arcpy import env

env.workspace = "C:\\RasterFolder"


rasters = [arcpy.RasterToNumPyArray(arcpy.Describe(ras).catalogPath,nodata_to_value=10000000000) for ras in arcpy.ListRasters()]
  
marray = numpy.ma.masked_values(rasters,10000000000)  
  
per = 20  
  
percentile_val = numpy.percentile(marray.compressed(),(per))  
  
print percentile_val 

You could use a wild card to filter raster names and folders as well for the ListRasters function.

0 Kudos
anTonialcaraz
Occasional Contributor II

That was very useful Luke. Thanks a lot.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Ooopps

Toni... before you get to far you are using the wrong percentile if you have nodata areas

use.,...
>>> help(np.nanpercentile)
Help on function nanpercentile in module numpy.lib.nanfunctions:

nanpercentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
    Compute the qth percentile of the data along the specified axis, while
    ignoring nan values.
   
    Returns the qth percentile of the array elements.

anTonialcaraz
Occasional Contributor II

Thanks Dan. It actually gives the same results within the script so I think it should be fine.

Cheers

DanPatterson_Retired
MVP Emeritus

Ok good.  You can skip the masked part as well if there are no, nodata areas then.  I does make a difference in processing time should you be working with large arrays and/or many arrays.  Should others be interested, there are nan equivalents for most of the statistical functions ( eg. np.mean --> np.nanmean)