Interpolation of missing pixels

HeatherWelch1 · ‎05-08-2013

Hello--

I'm working on a project using monthly 9km resolution ocean color images (of chlorophyl a). Due to clouds, satellite paths, etc, the images have a fair amount of no data pixels. I need to fill in the pixels, but am wary of creating data. A few questions:

1. Is there a general consensus about the maximum size of no data regions that can still be filled reasonably accurately?

2. If I assign the maximum no data region size that can be filled, say, at 25, does this mean that blocks of missing data greater than 25 pixels will not be filled? What about a group of 25 pixels connected diagonally?

3. Does anyone have a suggestion for what fill method to use? I've been playing around with some options in the Marine Geospatial Ecology Tools package, such a Laplacian and Spring, but there are several others such as Del2b, Del2c, and Del4. I've also looked into using a roving window under focal statistics.

4. During the fill process, I'd be interested to create a 2nd raster that details the confidence in filled pixel values. Perhaps this would be best expressed as the number of real value pixels that were used to create an interpolated value. Has anyone done something like this, or have any ideas for how it might be done?

Cheers,
Heather

JasonRoberts · ‎05-12-2013

Heather,

I am the lead developer of the Marine Geospatial Ecology Tools toolbox that you have been experimenting with for this problem. I'll do my best to answer your questions.

1. There is no such consensus that I am aware of. It all depends on how much potential error you are willing to tolerate in your application. I know that is a tough decision and there are not any easy-to-use tools to help you decide it. I can mention a couple of scenarios that might be the extremes between which your decision might lie.

On one extreme, there is the scenario where you want to build a long term climatology--a single image that shows the mean condition, built by averaging together a large number of short-duration images (e.g. daily). In this case, if you have enough images available, you might not want to interpolate/extrapolate any values at all. Instead you'd just use reliable cloud-free images and hope there were enough of them so that each pixel at least received several dozen values over the course of your time series.

At the other is the scenario where you have very expensive data points--such as the route of a satellite-tracked animal, or the locations where animals were sighted during an aerial abundance survey--and you want to obtain values for those points. Because the points were so expensive to collect, you cannot afford to go get more of them when a cloud obscures the remote sensing image. So, instead, you are willing to tolerate some error in the remote sensing estimate in order to obtain some values for your points.

You're probably facing something like the second scenario. A traditional way to deal with that is to use a temporally averaged image (e.g. an 8-day or monthly) produced by the remote sensing data provider. This has the advantage of filling in many holes, usually using data before and after your focal date, but the possible disadvantage of smoothing the data. For example, if you are lucky enough that a daily image is cloud-free on your focal date, using the monthly image will dilute the daily value by combining it with all other cloud-free values that month.

A second traditional way to deal with it, familiar to many people who have GIS functions such as the Spatial Analyst, is to smooth the image spatially rather than temporally, using something like a roving window via Focal Statistics. The MGET tool you're using implements that approach, but with more sophisticated algorithms than Focal Statistics. I'll discuss this more under question 3.

2. "If I assign the maximum no data region size that can be filled, say, at 25, does this mean that blocks of missing data greater than 25 pixels will not be filled?" Yes, that's right. "What about a group of 25 pixels connected diagonally?" Diagonal connections are not considered. Another way of saying it, as is done in the MGET documentation, is that 4-connected pixels are considered. (Pixels connected diagonally are called 8-connected pixels.)

3. All of the heavy lifting in this tool--the implementation of the algorithms--is done by code written by John D'Errico and generously shared for anyone to use. You can download his original code and see some of the discussion of it here. In short, you may choose from two basic algorithms: the four "Del" algorithms and the "Spring" algorithm. The Del algorithms use differential calculus to try to guess how the data are curving and then interpolate that curve into and across the hole. This presentation might help understand that method. The spring algorithm imagines that the image is a mountain with a hole in it. A grid of springs spans the hole and is connected to the edges. It then lets the springs pull on each other until they all stop moving, then measures the height at each pixel.

You can see John's descriptions of these two approaches on his 26 Oct 2010 post here. Thanks to him for sharing his code, and apologies if I have mis-described its behavior.

The four Del algorithms are all variations on each other. If you do not understand the details, I recommend staying with the first one Del2a.

Two advantages of these algorithms over the focal statistics approach are:

They explicitly handle non-linear trends in the surface you are filling.

They can fill in windows of any size. With focal statistics you can only fill in larger windows by increasing your window size and thereby accepting a larger degree of smoothing.

At the end of the day, it would be best to consider both time and space when trying to guess values. For example, if you have a large hole in a daily image caused by a fast-moving cloud, rather than interpolating from far away in space, it would be best to interpolate from a short distance in time (i.e. the previous and next days' images). We have not had time to implement anything like this in MGET yet, at least not publicly. We have implemented prototype code to do 3D Gaussian smoothing and are in the process of applying it in our own research. John D'Errico has implemented a 3D version of his inpainting code. Eventually we will roll both of these approaches into MGET.

You might be able to achieve a limited form of that by applying the MGET filling tool to 8-day images. This would achieve a kind of spatiotemporal smoothing.

4. The closest I have seen to something like this is in the GHRSST project, where various teams are producing cloud-free SST images using various approaches (these are called L4 images). MGET has tools for downloading those. For these, the providers are supposed to provide some estimate of error. Each product comes with an error image in addition to the SST image. But I am not familiar with the techniques they use to estimate error. You might look up their papers and see how they do it.

That was more than I intended to write, but I've been waiting for something to finish. It is now done so I'll sign off for now.

Hope that helps,

Jason

HeatherWelch1 · ‎05-12-2013

Jason,

Thank you so much for taking the time to write such a thorough reply. Your response is unbelievably helpful.

Best,
Heather

HeatherWelch1 · ‎06-17-2013

Hi Jason, hoping to catch you again.

I've gone with the Del2a fill, as you suggested for someone who does not understand the math behind the method. I'm struggling to gain a surface understanding of how the fill applies both interpolation and extrapolation. If a pixel is missing data, is it's value both interpolated and extrapolated and then compared? Or does the fill follow the curve of known data points to value a missing pixel, and it's simply interpolated if it's found value is within the known data range, and extrapolated if it's found value is outside the known data range? Apologies for my basic language.

-Heather

JasonRoberts · ‎06-18-2013

Heather,

It is probably best to read up a bit on interpolation and extrapolation. The Wikipedia articles on those two words are pretty good.

Although I have not studied the code in detail (as I mentioned, it was written by someone else and is pretty complicated), in principle it should work like this: when a patch of NoData cells are more or less surrounded by cells with data, then the code should perform Laplacian interpolation. You can read more about that here. Basically the code will fit a curve through the known points. In deciding the curve's shape, it tries to minimize sudden changes in the slope of the curve, so that the resulting curve will be smooth. No matter what, though, the curve must go through all of the known points, so if the known points force the curve to be steep in places, it will be steep there. This differs from some other interpolation methods, such as the cubic spline interpolator in ArcGIS (if I recall correctly) that can flatten the known points somewhat when it fits its curve.

When the patch of NoData cells are not surrounded by cells with data, linear extrapolation is performed. In principle, this should only happen around the edges of the grid. If the edge of the grid is completely filled in--if there only "holes" in the grid--then no extrapolation will be performed anywhere, only interpolation. But if the grid shows an island surrounded by NoData cells, then the area surrounding the island will be extrapolated. Linear extrapolation means that a surface will be fit that has a constant slope, rather than a curved surface that gets steeper or shallower as you move away from the area with data.

So--if I understand the code correctly--the idea is this: where cells are surrounded by data, the code creates a curved surface, presumably under the theory that there is a lot of information to go on, so the curve is a reasonable guess. But where cells are not surrounded by data, it creates a flat (non-curving) surface to try to avoid guessing unreasonably large or small values, since there is not much information to suggest what is happening out there.

I hope that helps,

Jason

HeatherWelch1 · ‎06-19-2013

Hi Jason,

Thanks for another extremely clear answer. Much appreciated.

Cheers,
Heather