Rasters can have holes (also called voids, gaps, or NoData) in them. These areas can be large and very visible, or they can be individual or small groups of cells scattered throughout and not easily seen. Suppose you want to eliminate the holes? How can you replace them with meaningful values, while preserving the existing values?
There is a variety of ways to do that. Here we will give some background, a few things to consider, and then show four common solutions.
A raster is a data structure that records information about a phenomenon, such as category, magnitude, height, or image reflectance, organized into a regular matrix of equally sized cells arranged into rows and columns. However, sometimes there is not enough information available to give a value to a cell, either from the data source or the output from an analytical operation. The concept of NoData is used to represent these cells.
When displaying rasters, the renderers allow you to set the NoData cells to appear either with a color of your choosing, or to not display at all (transparent).
For analysis, how input NoData cells are handled can vary based on the tool being used. For some tools, those locations will remain NoData in the output. Other tools calculate an output value based on other available values. The reference documentation commonly addresses the NoData behavior for a particular tool. However, the behavior may not be suitable for your analysis. You may want to identify these NoData cells so you can replace them with appropriate cell values. How can you do that?
There is a tool that explicitly identifies NoData cells, the Is Null tool. It checks each cell and returns a value of 1 if a cell is NoData, and a value of 0 (zero) if a cell is any other value. If the resulting raster has values of both 0 and 1, then the input raster has NoData cells present. In the attribute table of the raster, the Count field tells you how many cells there are of each value.
In the following example, the input raster is on the left with the NoData cells rendered in white. This raster has 9 rows of 10 columns, and thus has a total of 90 cells. There are four unique values (3, 9, 15, and 22). Summing up the cell counts of each value (11, 22, 25, and 10, respectively), gives a total of 68 cells. Taking 90 and subtracting 68 from it tells you there are 22 NoData cells in the input. The output from Is Null and its attribute table is on the right. The cell value of 1 represents NoData, and it matches the expected count of 22 cells.
Before starting any analysis, first look closely at the problem you are trying to solve. To make sure you follow the right analytical path to get the appropriate answer, carefully consider the types of analysis you will perform.
When it comes to picking the right solution for filling NoData areas, some factors to consider are:
To choose the proper solution, first determine where the replacement values are coming from. Common sources of replacement values include:
Once you decide the source of the replacement values, then you just need to follow a specific workflow to achieve the result.
There are of course many types of raster analysis that can be done, and different ways to go about doing it. This article does not cover all the scenarios, but will focus on some of the typical ones.
The following graphic illustrates these four workflows at a general level. The column on the left lists the scenarios according to where the replacement values will come from. The center column shows the basic workflow to use for each scenario. The rightmost column provides some information on the general applicability of the scenarios, considering the size of the NoData area and the type of data.
The easiest way to fill NoData is to replace those cells with a specific value.
By using the same value for all instances, you can apply it to all the NoData cells without having to consider their size or distribution. This method is most suited for discrete data.
As shown earlier, you can use the Is Null tool to create a raster that uses a value of 1 to indicate where a cell in an input raster is NoData, and a value of 0 for locations of any other value in the input raster.
The Con tool evaluates each cell of an input raster based on a logical condition. You could take the output from the Is Null tool and use it in the Con tool as the input that identifies which of the input cells are NoData and thus will be replaced with the value specified in the true parameter. However, the Expression in the Con tool also has the capability to do an is null operation, as shown here:
In the Con tool, do the following:
The following illustration shows the NoData locations replaced with the new value of 2, while the other locations retained their original value.
Instead of using a constant value, another raster can provide the values to replace NoData cells with.
If it makes logical sense to replace them all with cell values from another raster, there is no need to consider the size and distribution NoData areas. You can apply this method to both discrete and continuous data, but it is best to match the type of the replacement raster to the type of the raster you are updating.
In the Con tool, do the following:
Note:
If the raster providing the replacement values has different properties than the raster containing the NoData cells, such as extent, cell size, cell alignment, or coordinate system, remember to account for these differences. By default, the Con tool will use the union of the extents of the two input rasters, and the maximum cell size. To preserve the existing cell values that are not NoData and avoid them being resampled, be sure to set the Extent, Cell Size, and Snap raster environments to your original input raster.
The following illustration shows how values from the other raster replaced the NoData locations, while the other locations retained their original value.
You may want to replace NoData cells with the value of the nearest (closest) cell. There is a tool that can do this: Nibble. It will replace the input values for a defined area with the nearest value that is outside that area.
While the method can fill in large areas, it may be less logical for the replacement values to come from further away. Since the replacement values come from the same set of values as the input, this method is most suited to discrete data.
The Nibble tool has two required inputs. The first input is the raster for which the values at selected locations will be replaced with the nearest value. The second input is a mask that identifies what those locations are. For this input, NoData cells represent locations that are within the mask, and cells with any other value are outside the mask. There are two additional parameters that give specific control over how NoData cells are handled. By setting them a particular way, NoData cells in the input raster can also define the mask area. This means that you can use the same raster for both required inputs.
In the Nibble tool, do the following:
The following illustration shows how the value of the nearest input cell replaces the NoData locations. To make comparison easier, a dark red outline on the output raster identifies the NoData locations of the input raster.
In the case of ties, where there are two or more input cells that are nearest to a NoData cell, the output will be the lowest of the tied values. In the part of the figure below the dashed line, the numbers show the distance (in cell units) from each NoData cell to the nearest cell outside the mask. For a portion of the NoData cells, small arrows identify the specific input cell that provides the replacement value.
A statistic calculated from the surrounding cells can replace a NoData cell.
This can be done by incorporating the neighborhood tool Focal Statistics into the analysis. For each input cell location, this tool calculates a statistic of the values within a specified neighborhood around it. You can specify a variety of neighborhood shapes, such as a rectangle, a circle, or a pie-shaped wedge, in whatever size you need. There are a variety of statistics to calculate, such as the average or minimum value.
The size of the areas of NoData is a consideration for this tool. For individual or small groups of NoData cells, the small default 3 by 3 cells neighborhood size can calculate the replacement value. To fill larger areas of NoData, either expand the size of the neighborhood, or run the process several times. For discrete data, the statistics that are most appropriate to use with this method are the maximum, minimum, most common, and least common. For continuous data, the mean statistic is typically the best one to use.
If run by itself, the Focal Statistics tool will calculate a statistic value for every cell in the input raster. To perform the calculations only on the NoData cells, we will apply the technique of using the Is Null tool within the Con tool to identify those locations. Then we will use the Focal Statistics tool to calculate a new value for those locations only.
To embed this tool in Con, it is necessary to create a complex expression in map algebra. In ArcGIS Pro, this can be done in the Raster Calculator tool or in the Python Window.
Apply the following workflow to run the Focal Statistics tool in the Python window:
The following graphic shows an example of the syntax used to create a rectangular 3 by 3 cell focal neighborhood:
The following illustration shows how values from input raster replaced the NoData locations with the maximum value in a 3 by 3 cell neighborhood around them. In this case, the size of the NoData area is larger than the size of the focal processing window. This would cause some of the NoData locations to remain as NoData in the output from the focal operation, since there were no input values to do a calculation for. To resolve this, you can either run the operation again to replace the remaining NoData value, or use a larger neighborhood. This example ran the focal operation two times, with the output from the first pass used as input to the second.
The following is the syntax used in the Python command window to create the final output raster for this example:
import arcpy
from arcpy import env
env.workspace = "C:\project1"
OutFS1 = Con(IsNull('inRas.tif'),FocalStatistics('inRas.tif',NbrRectangle(3,3, 'CELL'), 'MAXIMUM'), 'inRas.tif')
OutFS2 = Con(IsNull('OutFS1'),FocalStatistics('OutFS1',NbrRectangle(3,3, 'CELL'), 'MAXIMUM'), 'OutFS1')
OutFS2.save("C:\Project1\outFmax.tif")
In this article, we touched on what NoData is, how to process it away, and some things to consider about the nature of your input. We then went over several solutions for replacing NoData cells in a raster using the functionality available in ArcGIS Spatial Analyst.
There are other scenarios that can have a similar objective. One example is to use an interpolation tool replace NoData cells in an elevation raster, with the goal of maintaining the local trends in the surround surface cells. Another is to first classify the nature of the NoData areas and then apply different workflows to each category sequentially to get a final product.
You can take the logic behind the workflows shown here and apply them to many applications in your own work.
Additional reading
To learn more about this aspect of raster analysis and specific tools used, start out by looking at the following help topics:
The original blog was first published in the ArcGIS Blog, and can be found here:
https://www.esri.com/arcgis-blog/products/analytics/analytics/fill-nodata-holes-in-raster-data/
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.