Hi there,
I've been calculating percentiles out of monthly raster values, that is one month/raster at a time. I've been using the RasterToNumPyArray function and all good. Like this:
array = arcpy.RasterToNumPyArray((raster),nodata_to_value=10000000000) marray = numpy.ma.masked_values(array,10000000000) per = 20 percentile_val = numpy.percentile(marray.compressed(),(per)) print percentile_val
Now, I need to do the same but taking all 12 months into the percentile calculation. That is, using all cell values from all 12 rasters to calculate the percentile. Like putting all values from the 12 raster months into a single pool of values. I tried this:
array = arcpy.RasterToNumPyArray([(raster1), (raster2), (raster3), (raster4), (raster5), (raster6), (raster7), (raster8), (raster9), (raster10), (raster11), (raster12)],nodata_to_value=10000000000) marray = numpy.ma.masked_values(array,10000000000) per = 20 percentile_val = numpy.percentile(marray.compressed(),(per)) print percentile_val
But I think the RasterToNumPyArray function does not allow multiple rasters.
I wonder if someone could give me some tips to overcome this issue.
Many thanks in advance
Tony
Solved! Go to Solution.
This is a very helpful code thanks Dan and Toni. Not that it is needed Toni, but if all your raster are in a single workspace and you would like to compress your code you could have something like this:
import arcpy,numpy from arcpy import env env.workspace = "C:\\RasterFolder" rasters = [arcpy.RasterToNumPyArray(arcpy.Describe(ras).catalogPath,nodata_to_value=10000000000) for ras in arcpy.ListRasters()] marray = numpy.ma.masked_values(rasters,10000000000) per = 20 percentile_val = numpy.percentile(marray.compressed(),(per)) print percentile_val
You could use a wild card to filter raster names and folders as well for the ListRasters function.
This way is easiest...just get your arrays in, I have created some for demo purposes
>>> a = np.zeros(9,dtype=int).reshape((3,3)) >>> b = np.ones_like(a) # all ones/True >>> c = b*2 >>> d = b*3 >>> stack = np.array([a,b,c,d]) >>> stack array([[[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[1, 1, 1], [1, 1, 1], [1, 1, 1]], [[2, 2, 2], [2, 2, 2], [2, 2, 2]], [[3, 3, 3], [3, 3, 3], [3, 3, 3]]]) >>> per = 20 >>> np.percentile(stack,per,axis=0) array([[ 0.6, 0.6, 0.6], [ 0.6, 0.6, 0.6], [ 0.6, 0.6, 0.6]]) >>> per = 50 >>> np.percentile(stack,per,axis=0) array([[ 1.5, 1.5, 1.5], [ 1.5, 1.5, 1.5], [ 1.5, 1.5, 1.5]]) >>>
You will only be limited by memory as to whether you can read all the rasters in as arrays all at once.
Thanks Dan. I guess when it comes to rasters it should be something like this? It seems to be working.
raster1 = Raster(env.workspace + "\\" + "Maas_hvel_jan") raster2 = Raster(env.workspace + "\\" + "Maas_hvel_feb") raster3 = Raster(env.workspace + "\\" + "Maas_hvel_mar") raster4 = Raster(env.workspace + "\\" + "Maas_hvel_apr") raster5 = Raster(env.workspace + "\\" + "Maas_hvel_may") raster6 = Raster(env.workspace + "\\" + "Maas_hvel_jun") raster7 = Raster(env.workspace + "\\" + "Maas_hvel_jul") raster8 = Raster(env.workspace + "\\" + "Maas_hvel_aug") raster9 = Raster(env.workspace + "\\" + "Maas_hvel_sep") raster10 = Raster(env.workspace + "\\" + "Maas_hvel_oct") raster11 = Raster(env.workspace + "\\" + "Maas_hvel_nov") raster12 = Raster(env.workspace + "\\" + "Maas_hvel_dec") array1 = arcpy.RasterToNumPyArray(raster1,nodata_to_value=10000000000) array2 = arcpy.RasterToNumPyArray(raster2,nodata_to_value=10000000000) array3 = arcpy.RasterToNumPyArray(raster3,nodata_to_value=10000000000) array4 = arcpy.RasterToNumPyArray(raster4,nodata_to_value=10000000000) array5 = arcpy.RasterToNumPyArray(raster5,nodata_to_value=10000000000) array6 = arcpy.RasterToNumPyArray(raster6,nodata_to_value=10000000000) array7 = arcpy.RasterToNumPyArray(raster7,nodata_to_value=10000000000) array8 = arcpy.RasterToNumPyArray(raster8,nodata_to_value=10000000000) array9 = arcpy.RasterToNumPyArray(raster9,nodata_to_value=10000000000) array10 = arcpy.RasterToNumPyArray(raster10,nodata_to_value=10000000000) array11 = arcpy.RasterToNumPyArray(raster11,nodata_to_value=10000000000) array12 = arcpy.RasterToNumPyArray(raster12,nodata_to_value=10000000000) marray = numpy.ma.masked_values([(array1), (array2), (array3), (array4), (array5), (array6), (array7),(array8), (array9), (array10), (array11), (array12)],10000000000) per = 20 percentile_val = numpy.percentile(marray.compressed(),(per)) print percentile_val
This is a very helpful code thanks Dan and Toni. Not that it is needed Toni, but if all your raster are in a single workspace and you would like to compress your code you could have something like this:
import arcpy,numpy from arcpy import env env.workspace = "C:\\RasterFolder" rasters = [arcpy.RasterToNumPyArray(arcpy.Describe(ras).catalogPath,nodata_to_value=10000000000) for ras in arcpy.ListRasters()] marray = numpy.ma.masked_values(rasters,10000000000) per = 20 percentile_val = numpy.percentile(marray.compressed(),(per)) print percentile_val
You could use a wild card to filter raster names and folders as well for the ListRasters function.
That was very useful Luke. Thanks a lot.
Ooopps
Toni... before you get to far you are using the wrong percentile if you have nodata areas
use.,...
>>> help(np.nanpercentile)
Help on function nanpercentile in module numpy.lib.nanfunctions:
nanpercentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
Compute the qth percentile of the data along the specified axis, while
ignoring nan values.
Returns the qth percentile of the array elements.
Thanks Dan. It actually gives the same results within the script so I think it should be fine.
Cheers
Ok good. You can skip the masked part as well if there are no, nodata areas then. I does make a difference in processing time should you be working with large arrays and/or many arrays. Should others be interested, there are nan equivalents for most of the statistical functions ( eg. np.mean --> np.nanmean)