Select to view content in your preferred language

Focal Statistics Tool Weighted Mean Erroneous Calculation

3156
4
09-06-2011 10:08 AM
JohnWhitman
Emerging Contributor
Use of the Spatial Analyst focal statistics tool with the Weight neighborhood type and an arbitrary weight kernel file to compute a mean, the result computed for any processing cell is NOT the expected "weighted mean" of the values of the cells within its neighborhood.

That this is the case may be seen from the following example which uses an input processing raster filled with constant values and the kernel file shown below.

Input processing raster:

1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000
1000 1000 1000 1000 1000 1000 1000 1000 1000 1000

Kernel file:

5 5
1 2 3 2 1
2 4 6 4 2
3 6 9 6 3
2 4 6 4 2
1 2 3 2 1

Output raster (NoData cells ignored):

4000 4000 3600 3600 3600 3600 3600 3600 4000 4000
4000 4000 3600 3600 3600 3600 3600 3600 4000 4000
3600 3600 3240 3240 3240 3240 3240 3240 3600 3600
3600 3600 3240 3240 3240 3240 3240 3240 3600 3600
3600 3600 3240 3240 3240 3240 3240 3240 3600 3600
3600 3600 3240 3240 3240 3240 3240 3240 3600 3600
3600 3600 3240 3240 3240 3240 3240 3240 3600 3600
3600 3600 3240 3240 3240 3240 3240 3240 3600 3600
4000 4000 3600 3600 3600 3600 3600 3600 4000 4000
4000 4000 3600 3600 3600 3600 3600 3600 4000 4000

The value of 3240 seen for the cells in the central part of the output raster is 81000 / 25. (The other values near the array edges are different because the processing has encountered NoData cells. No further discussion will be given here of any cells in the output array near the edges.)

The value 81000  for the cells in the central part of the output raster is the weighted sum of the product of the input cell raster cell values and the kernel weights (as it should be). This can be confirmed by computing the sum statistic, instead of the mean statistic, using the focal statistics tool on the same input raster and with the same kernel.

The problem, then, is the tool's division of the weighted sum values by 25 (the number of cells in the kernel neighborhood, rather than by 81 (the sum of the weights used in that kernel neighborhood. Conventionally, and intuitively, the divisor by which a weighted mean should be created from a weighted sum should be the sum of the weights. That is not what this tool does.

A workaround for this error is easily contrived. Since the weighted sum statistic is computed correctly for a non-uniform kernel, the weighted sum tool can be used to compute the weighted mean for any kernel by proportionally reducing each of the kernel values so that the sum of all weights in the kernel is unity. For my example above, since the kernel weights add to 81, each should be divided by 81 to form a new kernel file. With that change, then use of the focal statistics tool to compute a weighted sum statistic does produce values of 1000 throughout the central portion of the output raster.
4 Replies
EricRice
Esri Regular Contributor
Hi John,

I think the confusion originates from us using the term "weighted neighborhood" in regards to the Mean statistic.  The documentation could be enhanced and I've spoken to the tool owner. As you must have noticed when you choose what statistic you want from the Statistic Type parameter, there isn't an entry called "Weighted Mean"; There is only "Mean".  Based on your result set, the tool is functioning as designed.  We returned the mean value of your weighted raster based on the kernel file you provided.

I submitted an enhancement request on your behalf to incorporate weighted mean as a new statistic type.  You can track it with NIM072396.  It shouldn't be that hard to do - afterall we just need to sum the weights and use the sum in the denominator.

Thanks for pointing this out!

Best Regards,
Eric
JohnWhitman
Emerging Contributor
Thanks to Eric Rice to his response to my intial post, noting that the tool is operating as designed, but acknowleging the desirability of an enhancement that would compute a weighted  mean correctly and submitting a request for such an enhancement.

My use of the focal statistics tool was an attempt to circumvent limitations of the Spatial Analyst low pass filter tool. That tool specifies a 3x3 kernel with uniform weighting, but I needed to do weighted filtering. In a response to an earlier post that I had made re that low pass filter tool, ESRI had suggested that I might overcome its limitations by using the focal statistics tool with a weighted neighborhood to compute the mean statistics. As that tool now operates, this is not possible.

In Eric's response to my post, he states "after all, we just need to sum the weights and use the sum in the denominator". This statement is correct and it is sufficient in the case where the input array includes no NoData cells. In the general case, NoData cells may be present and the design of an enhancement and the documentation of that enhancement must treat these appropriately. For a weighted mean calculation on an input array containing NoData elements, the divisor must be the sum of the weights associated with the valid data cells. When the (default) "Ignore NoData" option is selected, the NoData cells must be ignored both in the numerator computation of the weighted sum and in the denominator summation of weights.

The way that the current Spatial Analyst low pass filter tool handles NoData cells (both internal to the input array and beyond its extents) is an appropriate model for an enhancement of the weighted mean calculation of an enhanced focal statistics tools as well. It needs only to be generalized to allow different kernel sizes/shapes and non-uniform weights.

Caution: My original post in this thread included an example in which there were no NoData cells in the input array. When there are no NoData cells, the workaround that I described in that post allows the current version of the focal statistics tool to correctly compute a weighted mean in the central portion of the array, but my workaround does not handle NoData cells properly.

JGWjr
0 Kudos
curtvprice
MVP Alum

John Whitman:

When there are no NoData cells, the workaround that I described in that post allows the current version of the focal statistics tool to correctly compute a weighted mean in the central portion of the array, but my workaround does not handle NoData cells properly.

I'm wondering if you could create a grid of ones and sum that up using your kernel with a mask environment set to your value raster to calculate the needed denominator for the calculation.

By the way, here's an issue I found with kernel files using arcpy; readers of this thread may be interested:

Working with kernel files in arcpy: NbrIrregular bug

0 Kudos
EricRice
Esri Regular Contributor
John,

I have linked this thread directly into the enhancement request, so all concerns are highlighted and known to all internal staff who would do the implementation.  We'll be sure to handle your areas of concerns appropriately.

Thanks again!

Eric
0 Kudos