Select to view content in your preferred language

Two problems with the Summarize Within Output in ArcGIS Pro

285
2
2 weeks ago
Seyed-MahdySadraddini
Regular Contributor

I was checking the outputs of the standard deviation from the tool as they did not make sense to me.

I realized there are two problems with the outputs:

1. This is a lesser problem which may or may not be desirable depending on your analysis. The calculated mean value treat "0" as a valid number. In my case this is not desirable. So, I should have excluded the features with value 0 from my dataset prior to running the tool on my fc.

2. This is a bigger problem and it is either desirable or not depending on whether your data points are samples, or the entire population! I calculated the standard deviation by hand using both sample and population formulas. I realized that the tool is calculating the sample standard deviation whereas I want the population standard deviation. So, in this case I will have to extend the tool in python to calculate the correct mean and standard deviation for my data points.

0 Kudos
2 Replies
DanPatterson
MVP Esteemed Contributor

regarding your observations

  1. yes 0 is a valid observation in any measurement scale, convert 0 to nodata (aka, null).  This isn't unique to gis)
  2. to confirm standard deviation calculations, simply add the other statistics to derive it (sum, count, mean) and use standard formula to do a field calculation ( Standard deviation - Wikipedia )

... sort of retired...
0 Kudos
DanielFox1
Esri Regular Contributor

Hi @Seyed-MahdySadraddini 

Just wanted to add to Dan's post 

1. Zonal Statistics and Summarize Within treat 0 as a valid numeric value unless explicitly excluded. This can skew your mean and standard deviation if represents missing or irrelevant data.

How to Exclude Zeros

Use the SetNull function in Python or Raster Calculator to convert zero values to NoData before running your analysis:

SetNull("your_raster" == 0, "your_raster")

Solved: Zonal Statistics as Table - exclude values? - Esri Community

2.  Standard Deviation Defaults to Sample Formula
ArcGIS tools like Summarize Within and Calculate Field use the sample standard deviation formula by default:

• Sample SD divides by n - 1
• Population SD divides by N


This is confirmed by users who manually recalculated both and found that ArcGIS matched the sample formula. The rationale is that most spatial analyses assume you're working with a sample of a larger population, not the full population.

I hope this to understand this further.

0 Kudos