Use NumPy to calculate summary statistics?

2688
9
Jump to solution
01-30-2019 11:51 AM
ZacharyHart
Occasional Contributor III

I have a previously reclassified raster file which I run a simple process on and then need to generate a very basic report about the percentages of each class.

I'm able to calculate the total count for each raster value/class using summary stats and was wondering if there's a way to skip summary stats and accomplish this with NumPy instead?

Here's a sample code:

import arcpy
import pandas as pd

InRaster = "SomeSingleBandRaster" ##This raster was reclassified to have 4 classes##
OutGDB = arcpy.env.scratchGDB
SlopeReport = OutGDB + '/' + "SlopeReport"
StatsTable = OutGDB + '/' + "StatsTable"


#Generate Summary Statistics#
arcpy.analysis.Statistics(InRaster, StatsTable, "Value SUM", "Count")

#Create array and calculate percentate of each class
array = arcpy.da.TableToNumPyArray(StatsTable, ['Count','SUM_Value'])
df = pd.DataFrame(array)
df['perc'] = df["Count"] / df["Count"].sum() * 100
print(df)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

This will generate the following output:

    Count  SUM_Value       perc
0     5.0        4.0   0.198255
1   274.0        3.0  10.864393
2  1057.0        1.0  41.911182
3  1186.0        2.0  47.026170‍‍‍‍‍

This works but I'm always eager to cut out unnecessary steps (while hopefully learning something in the process!)

Thanks!

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
DanPatterson_Retired
MVP Emeritus

Lesson 2... *tif file in … info out.

First the verbose code that produces the outcome, followed by explanations

fn = "c:/Temp/InRaster/InRaster.tif"
import arcpy
a = arcpy.RasterToNumPyArray(fn)
uniq, cnts = np.unique(a, return_counts=True)
per = cnts[:-1]/np.sum(cnts[:-1]) *100.

dt = [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')]
sze = cnts[:-1].size
out = np.zeros(sze, dtype=dt)
out['Value'] = uniq[:-1]
out['Count'] = cnts[:-1]
out['Percent'] = per

We don't need to know the coordinates of the lower left, the number of rows and columns or the nodata value, since we can get that when we read in the raster.

RasterToNumPyArray—ArcPy Functions | ArcGIS Desktop 

RasterToNumPyArray (in_raster, {lower_left_corner}, {ncols}, {nrows}, {nodata_to_value})

Basic information on the array shape and data type

a.shape  # ---- number of rows and columns
(84, 55)

a.dtype  # ---- data type of the array, unsigned 16bit integer
dtype('uint16')

# ---- the nodata value is 65535, which is the maximum integer value for uint16
# ---- get this infor from `uniq`
uniq
array([    1,     2,     3,     4, 65535], dtype=uint16)

# ---- counts return a 64bit integer, the 2098 is the nodata cell count
cnts
array([1057, 1186,  274,    5, 2098], dtype=int64)

# ---- At this stage we want to remove our nodata values from the counts and % calculations‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Line 5 in the first code block

per = cnts[:-1]/np.sum(cnts[:-1]) *100.

basically means " use all the cnts up to but no including the last value, divide it by their sum and multiply by 100%.

cnts[:-1] ... is a slice, the : means everything up to but excluding...  Read up on array slicing... This is only a 2D array, but it can be head melting for arrays beyond 4D.

You will notice that I did slicing in lines 10 and 11 to exclude the no data values, but I didn't need to do if for the 'per' line because I sliced already.

Of course, the above can be put into a 'def' (aka function) if you need to do it all the time, and the code can be simplified greatly... Just to show you how working with basic arrays is easy.

When you want to work with things like area... then it becomes important to know the cell size of each array element.  That is where 'human-ware' comes into play (ie, take notes) and sometimes exporting rasters to old-school ASCII rasters is useful since the cell size information goes out with it.   RasterToNumPyArray, doesn't... you have to remember..

So you want to save stuff?  I use numpy as well.

out_fn = "C:/temp/InRaster/in_raster.npy"
np.save(out_fn, a)

Hmmm maybe I should look at it again, just in case

a_gain = np.load("C:/temp/InRaster/in_raster.npy")

np.all(a == a_gain)
True

All is good!

View solution in original post

9 Replies
DanPatterson_Retired
MVP Emeritus

Love a lesson... this is going to be verbose and I will leave out all the shortcuts and helper functions.

Assume you have your 'array'... I will generate a small array, so you can ignore the first two lines and begin with 

array 'a'

np.random.RandomState(123)
a = np.random.randint(0, 5, size=(10,10))

uniq, cnts = np.unique(a, return_counts=True)  # ---- get the unique values and their count
per = cnts/cnts.sum() *100.   # ---- the percentage, old school
dt = [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')]  # --- data type for output
out = np.zeros(uniq.size, dtype=dt)  # ---- make a 'container' to store the results
out['Value'] = uniq  # ---- fill the container
out['Count'] = cnts  # ----
out['Percent'] = per # ----

# ---- Result
out

array([(0, 22, 22.), (1, 16, 16.), (2, 19, 19.), (3, 23, 23.), (4, 20, 20.)],
      dtype=[('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')])

# ---- Oooooo I don't like that look ---- time for a 'helper' function (written by me)

Array info...
shape... (5,)  ndim... 1  dtype... [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')] 
╔═ 
║ ╔═ 
5  

 id  Value   Count   Percent  
------------------------------
 000       0      22     22.00
 001       1      16     16.00
 002       2      19     19.00
 003       3      23     23.00
 004       4      20     20.00

# ----- digest, I have lots of examples on my blog
ZacharyHart
Occasional Contributor III

Dan, this is great! And now i'm learning about NumPy Data type objects.

One thing: I'm getting an error which I'm guessing is a result of something specific with the raster I'm using.

out['Value'] = uniq
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: Can't cast from structure to non-structure, except if the structure only has a single field.‍‍‍‍

Conceptually, I understand what you're doing with your example. Thanks!

EDIT: I uploaded some sample raster data to my original post.

0 Kudos
DanPatterson_Retired
MVP Emeritus

I can check when I get back, there are a few tricks when moving from structured to non-structured not covered in my example.... lesson 2 coming soon

ZacharyHart
Occasional Contributor III

yay!

0 Kudos
DanPatterson_Retired
MVP Emeritus

Zachary... I also have loads of quasi-organized code on my GitHub site

Dan-Patterson (Dan Patterson) · GitHub 

arraytools and tools for pro might have some useful stuff

PS, if you have nodata areas, then all the stats stuff has 'nan' equivalents... ie np.sum there is np.nansum  which just sums all the nodata values.

numpy and scipy are both very powerful tools and you would be amazed at how much their functionality is used with the esri code base.  Many of the spatial analyst tools have a direct numpy equivalent/replacement

ZacharyHart
Occasional Contributor III

what a treasure trove!

0 Kudos
DanPatterson_Retired
MVP Emeritus

Lesson 2... *tif file in … info out.

First the verbose code that produces the outcome, followed by explanations

fn = "c:/Temp/InRaster/InRaster.tif"
import arcpy
a = arcpy.RasterToNumPyArray(fn)
uniq, cnts = np.unique(a, return_counts=True)
per = cnts[:-1]/np.sum(cnts[:-1]) *100.

dt = [('Value', '<i4'), ('Count', '<i4'), ('Percent', '<f8')]
sze = cnts[:-1].size
out = np.zeros(sze, dtype=dt)
out['Value'] = uniq[:-1]
out['Count'] = cnts[:-1]
out['Percent'] = per

We don't need to know the coordinates of the lower left, the number of rows and columns or the nodata value, since we can get that when we read in the raster.

RasterToNumPyArray—ArcPy Functions | ArcGIS Desktop 

RasterToNumPyArray (in_raster, {lower_left_corner}, {ncols}, {nrows}, {nodata_to_value})

Basic information on the array shape and data type

a.shape  # ---- number of rows and columns
(84, 55)

a.dtype  # ---- data type of the array, unsigned 16bit integer
dtype('uint16')

# ---- the nodata value is 65535, which is the maximum integer value for uint16
# ---- get this infor from `uniq`
uniq
array([    1,     2,     3,     4, 65535], dtype=uint16)

# ---- counts return a 64bit integer, the 2098 is the nodata cell count
cnts
array([1057, 1186,  274,    5, 2098], dtype=int64)

# ---- At this stage we want to remove our nodata values from the counts and % calculations‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Line 5 in the first code block

per = cnts[:-1]/np.sum(cnts[:-1]) *100.

basically means " use all the cnts up to but no including the last value, divide it by their sum and multiply by 100%.

cnts[:-1] ... is a slice, the : means everything up to but excluding...  Read up on array slicing... This is only a 2D array, but it can be head melting for arrays beyond 4D.

You will notice that I did slicing in lines 10 and 11 to exclude the no data values, but I didn't need to do if for the 'per' line because I sliced already.

Of course, the above can be put into a 'def' (aka function) if you need to do it all the time, and the code can be simplified greatly... Just to show you how working with basic arrays is easy.

When you want to work with things like area... then it becomes important to know the cell size of each array element.  That is where 'human-ware' comes into play (ie, take notes) and sometimes exporting rasters to old-school ASCII rasters is useful since the cell size information goes out with it.   RasterToNumPyArray, doesn't... you have to remember..

So you want to save stuff?  I use numpy as well.

out_fn = "C:/temp/InRaster/in_raster.npy"
np.save(out_fn, a)

Hmmm maybe I should look at it again, just in case

a_gain = np.load("C:/temp/InRaster/in_raster.npy")

np.all(a == a_gain)
True

All is good!

ZacharyHart
Occasional Contributor III

Dan,

I can't thank you enough for taking the time to do this! I marked this as correct and am looking forward to using NumPy even more.

Thanks so much!

0 Kudos
DanPatterson_Retired
MVP Emeritus

Let me know if you are having any troubles or if there is something you use often in your workflow...there is always lesson 3

0 Kudos