I have a previously reclassified raster file which I run a simple process on and then need to generate a very basic report about the percentages of each class.
I'm able to calculate the total count for each raster value/class using summary stats and was wondering if there's a way to skip summary stats and accomplish this with NumPy instead?
Here's a sample code:
import arcpy
import pandas as pd
InRaster = "SomeSingleBandRaster" ##This raster was reclassified to have 4 classes##
OutGDB = arcpy.env.scratchGDB
SlopeReport = OutGDB + '/' + "SlopeReport"
StatsTable = OutGDB + '/' + "StatsTable"
#Generate Summary Statistics#
arcpy.analysis.Statistics(InRaster, StatsTable, "Value SUM", "Count")
#Create array and calculate percentate of each class
array = arcpy.da.TableToNumPyArray(StatsTable, ['Count','SUM_Value'])
df = pd.DataFrame(array)
df['perc'] = df["Count"] / df["Count"].sum() * 100
print(df)
This will generate the following output:
Count SUM_Value perc
0 5.0 4.0 0.198255
1 274.0 3.0 10.864393
2 1057.0 1.0 41.911182
3 1186.0 2.0 47.026170
This works but I'm always eager to cut out unnecessary steps (while hopefully learning something in the process!)
Thanks!
Solved! Go to Solution.
Lesson 2... *tif file in … info out.
First the verbose code that produces the outcome, followed by explanations
fn = "c:/Temp/InRaster/InRaster.tif"
import arcpy
a = arcpy.RasterToNumPyArray(fn)
uniq, cnts = np.unique(a, return_counts=True)
per = cnts[:-1]/np.sum(cnts[:-1]) *100.
dt = [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')]
sze = cnts[:-1].size
out = np.zeros(sze, dtype=dt)
out['Value'] = uniq[:-1]
out['Count'] = cnts[:-1]
out['Percent'] = per
We don't need to know the coordinates of the lower left, the number of rows and columns or the nodata value, since we can get that when we read in the raster.
RasterToNumPyArray—ArcPy Functions | ArcGIS Desktop
RasterToNumPyArray (in_raster, {lower_left_corner}, {ncols}, {nrows}, {nodata_to_value})
Basic information on the array shape and data type
a.shape # ---- number of rows and columns
(84, 55)
a.dtype # ---- data type of the array, unsigned 16bit integer
dtype('uint16')
# ---- the nodata value is 65535, which is the maximum integer value for uint16
# ---- get this infor from `uniq`
uniq
array([ 1, 2, 3, 4, 65535], dtype=uint16)
# ---- counts return a 64bit integer, the 2098 is the nodata cell count
cnts
array([1057, 1186, 274, 5, 2098], dtype=int64)
# ---- At this stage we want to remove our nodata values from the counts and % calculations
Line 5 in the first code block
per = cnts[:-1]/np.sum(cnts[:-1]) *100.
basically means " use all the cnts up to but no including the last value, divide it by their sum and multiply by 100%.
cnts[:-1] ... is a slice, the : means everything up to but excluding... Read up on array slicing... This is only a 2D array, but it can be head melting for arrays beyond 4D.
You will notice that I did slicing in lines 10 and 11 to exclude the no data values, but I didn't need to do if for the 'per' line because I sliced already.
Of course, the above can be put into a 'def' (aka function) if you need to do it all the time, and the code can be simplified greatly... Just to show you how working with basic arrays is easy.
When you want to work with things like area... then it becomes important to know the cell size of each array element. That is where 'human-ware' comes into play (ie, take notes) and sometimes exporting rasters to old-school ASCII rasters is useful since the cell size information goes out with it. RasterToNumPyArray, doesn't... you have to remember..
So you want to save stuff? I use numpy as well.
out_fn = "C:/temp/InRaster/in_raster.npy"
np.save(out_fn, a)
Hmmm maybe I should look at it again, just in case
a_gain = np.load("C:/temp/InRaster/in_raster.npy")
np.all(a == a_gain)
True
All is good!
Love a lesson... this is going to be verbose and I will leave out all the shortcuts and helper functions.
Assume you have your 'array'... I will generate a small array, so you can ignore the first two lines and begin with
array 'a'
np.random.RandomState(123)
a = np.random.randint(0, 5, size=(10,10))
uniq, cnts = np.unique(a, return_counts=True) # ---- get the unique values and their count
per = cnts/cnts.sum() *100. # ---- the percentage, old school
dt = [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')] # --- data type for output
out = np.zeros(uniq.size, dtype=dt) # ---- make a 'container' to store the results
out['Value'] = uniq # ---- fill the container
out['Count'] = cnts # ----
out['Percent'] = per # ----
# ---- Result
out
array([(0, 22, 22.), (1, 16, 16.), (2, 19, 19.), (3, 23, 23.), (4, 20, 20.)],
dtype=[('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')])
# ---- Oooooo I don't like that look ---- time for a 'helper' function (written by me)
Array info...
shape... (5,) ndim... 1 dtype... [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')]
╔═
║ ╔═
5
id Value Count Percent
------------------------------
000 0 22 22.00
001 1 16 16.00
002 2 19 19.00
003 3 23 23.00
004 4 20 20.00
# ----- digest, I have lots of examples on my blog
Dan, this is great! And now i'm learning about NumPy Data type objects.
One thing: I'm getting an error which I'm guessing is a result of something specific with the raster I'm using.
out['Value'] = uniq
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: Can't cast from structure to non-structure, except if the structure only has a single field.
Conceptually, I understand what you're doing with your example. Thanks!
EDIT: I uploaded some sample raster data to my original post.
I can check when I get back, there are a few tricks when moving from structured to non-structured not covered in my example.... lesson 2 coming soon
yay!
Zachary... I also have loads of quasi-organized code on my GitHub site
Dan-Patterson (Dan Patterson) · GitHub
arraytools and tools for pro might have some useful stuff
PS, if you have nodata areas, then all the stats stuff has 'nan' equivalents... ie np.sum there is np.nansum which just sums all the nodata values.
numpy and scipy are both very powerful tools and you would be amazed at how much their functionality is used with the esri code base. Many of the spatial analyst tools have a direct numpy equivalent/replacement
what a treasure trove!
Lesson 2... *tif file in … info out.
First the verbose code that produces the outcome, followed by explanations
fn = "c:/Temp/InRaster/InRaster.tif"
import arcpy
a = arcpy.RasterToNumPyArray(fn)
uniq, cnts = np.unique(a, return_counts=True)
per = cnts[:-1]/np.sum(cnts[:-1]) *100.
dt = [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')]
sze = cnts[:-1].size
out = np.zeros(sze, dtype=dt)
out['Value'] = uniq[:-1]
out['Count'] = cnts[:-1]
out['Percent'] = per
We don't need to know the coordinates of the lower left, the number of rows and columns or the nodata value, since we can get that when we read in the raster.
RasterToNumPyArray—ArcPy Functions | ArcGIS Desktop
RasterToNumPyArray (in_raster, {lower_left_corner}, {ncols}, {nrows}, {nodata_to_value})
Basic information on the array shape and data type
a.shape # ---- number of rows and columns
(84, 55)
a.dtype # ---- data type of the array, unsigned 16bit integer
dtype('uint16')
# ---- the nodata value is 65535, which is the maximum integer value for uint16
# ---- get this infor from `uniq`
uniq
array([ 1, 2, 3, 4, 65535], dtype=uint16)
# ---- counts return a 64bit integer, the 2098 is the nodata cell count
cnts
array([1057, 1186, 274, 5, 2098], dtype=int64)
# ---- At this stage we want to remove our nodata values from the counts and % calculations
Line 5 in the first code block
per = cnts[:-1]/np.sum(cnts[:-1]) *100.
basically means " use all the cnts up to but no including the last value, divide it by their sum and multiply by 100%.
cnts[:-1] ... is a slice, the : means everything up to but excluding... Read up on array slicing... This is only a 2D array, but it can be head melting for arrays beyond 4D.
You will notice that I did slicing in lines 10 and 11 to exclude the no data values, but I didn't need to do if for the 'per' line because I sliced already.
Of course, the above can be put into a 'def' (aka function) if you need to do it all the time, and the code can be simplified greatly... Just to show you how working with basic arrays is easy.
When you want to work with things like area... then it becomes important to know the cell size of each array element. That is where 'human-ware' comes into play (ie, take notes) and sometimes exporting rasters to old-school ASCII rasters is useful since the cell size information goes out with it. RasterToNumPyArray, doesn't... you have to remember..
So you want to save stuff? I use numpy as well.
out_fn = "C:/temp/InRaster/in_raster.npy"
np.save(out_fn, a)
Hmmm maybe I should look at it again, just in case
a_gain = np.load("C:/temp/InRaster/in_raster.npy")
np.all(a == a_gain)
True
All is good!
Dan,
I can't thank you enough for taking the time to do this! I marked this as correct and am looking forward to using NumPy even more.
Thanks so much!
Let me know if you are having any troubles or if there is something you use often in your workflow...there is always lesson 3