How do I normalize raster data?

TaylerHamilton · ‎01-26-2012

I want to combine different raster layers, but need the data to be measured at the same numerical scale. Therefore, I would like to normalize all of the different layers so I can add them together, and then renormalize them.

I am not very good with statistics in general, and am unsure of how to even normalize a data set. I know there are multiple equations to do this, but don't know which one is right for me.

But if we were to use the equation:

Z = X - u / std

Where, Z - the normalized value;
u - mean; and,
std - standard deviation;

How would I calculate this in a raster?

I have tried using "Calculate Statistics" (Data Management Tools) because I assumed it would calculate mean and standard deviation for me, but I don't know where the output goes, as I can't choose that in the window. Also, the information on what this tool actually calculates is pretty limited in desktop help.

I have thought about using Focal Statistics to calculate mean and standard deviation individually, but I run into the problem of what scale to use - I want it to include the entire raster in the calculation. This tool would give me different outputs for mean and standard deviation, which I could then put into the raster calculator to solve the normalization equation.

Can someone please let me know if using Focal Statistics is the best way to go about this? Am I even using the right equation to normalize my data?

Any input is GREATLY appreciated!!

DanielAmrine · ‎07-23-2014

I am in a similar situation but I would like to reclassify my raster data into a correlation range of -1 to 1. I know I can classify my original raster and use the re-class tool, but I would prefer to use the raster calculator to do this so I can preserve the "Shape" of my raster.

I tried the (x-min(x))/(max(x)-min(x)) and it normalized to 0-1.

Thanks for your help!

Dan

FlorianHoedt2 · ‎07-23-2014

You could use the ndvi as an example. Its arithmetics are explained in the arcgis help or the www. Sorry my mobile i low on battery and i can not provide links currently.

what gives this?;

( max(x) - min(x) ) / ( max(x) + min(x) ) // this is stupid sorry

this is not tested though.

beware the spelling >> send from my mobile

JeffreyEvans · ‎07-23-2014

NDVI is a ratio between two bands. The equation to calculate the ratio on a single vector will result in all zero values. The equation you provide is a ratio in the upper and lower tails of the distribution and will result in a single value.

FlorianHoedt2 · ‎07-23-2014

i have done some test in excel to get it working:

( x - min(x) ) / (max(x) - min(x) ) -- your code to create 0 - 1 normalized raster --> let us call it raster1

( raster1 + 1) * -1 -- reverse the values 1-> 0 ; 0-> -1 ; call it raster2

( raster1 + raster2) -- -1 to 1

i am not a math or statistics guy though!

JeffreyEvans · ‎07-23-2014

Are you sure that you want to do this? You are, in effect, creating a two-tailed distribution where the negative tail is indicating a proportional negative response equal to the positive. If this is what you are after then you need to identify a "hinge point" that defines the inflection point, in the original distribution, indicating where the distribution will be centered on 0. This will provide relevant information on where the distribution changes to a negative influence. If the distribution is not centered then the result will be arbitrary. The equations for normalizing are not applicable here.

This will not necessarily provide a bounded -1 to 1 range but the equation for standardizing a distribution to a standard deviation of 1 and a mean of 0 thus, providing a Z-score, is: [(x - mean(x)) / standard deviation(x)]. You can play with the math to center on a different value. There is a good reason that we normally scale distributions to a 0-1 range.

The problem here is that a Z-score standardization assumes a Gaussian distribution, which you likely do not have. Overall I do not believe that this is a good idea. Perhaps if you provided some context I will be able to provide a relevant alternative.

If you data represents a known fixed range (e.g., correlations coefficients) and for some reason exhibit erroneous values, I would recommend just truncating them using a con statement (eg., con(raster < -1, -1, raster) ).

DanielAmrine · ‎07-23-2014

My challenge is correlating magnetic (total magnetic intensity, rated to pole, and several passes),gravity (bouguer anomaly, several passes) and depth to oil and gas production data.

I used a "Statistics for Dummy" book to find the correlation equation and then used a combination of excel and arcmap to map and correlate values for 652 wells.

The problem i'm having is the correlation coefficient is supposed to fall within -1.0 to 1.0 but each grid ranges from -8.5 to 8.5 or so, i can see the trend but I want it to fit within -1 to 1.

i hope that provides some good context!

Thanks!

Dan

JeffreyEvans · ‎07-23-2014

Well, the problem sounds like your correlation approach is not working and forcing the range will not correct the problem. Also the type of correlation (Person, Kendall, Spearman) is important given the data. I would revisit your methodology to figure out why your correlation coefficients are off rather than trying to fix the issue post hoc. Any approach for deriving correlation coefficients at the raster cell level using a combination of ArcMAP and Excel are opaque at best and some details would be required to evaluate the approach.

I am not clear on why you want these correlations as a raster. If you have well locations, with associated depth, you could just assign the gravitational anomaly raster values using the "Extract Multi Values to Points" tool, export the resulting attribute table to a flatfile (eg., csv) and calculate the correlation coefficient in Excel (not that Excel is good for statistical analysis).

It is interesting that you mention this particular correlative relationship. I just published a paper in PLoS One (Evans & Kiesecker 2014) using gravitational anomaly data, along with other covariates, in a probabilistic model of non-conventional oil/gas. I used a nonparametric model and found that this particular relationship is highly nonlinear in nature. Because of this, any correlations are likely to be erroneous. I would imagine that it is time to move your analysis into a statistical software and it may also be a good point to consult a statistician.

Evans, J.S., J.M. Kiesecker (2014) Shale Gas, Wind and Water: Assessing the Potential Cumulative Impacts of Energy Development on Ecosystem Services within the Marcellus Play. PLoS ONE 9(2): e89210. doi:10.1371/journal.pone.0089210

If you were to produce a surface of depth, using Kriging or any such interpolation method, you could easily calculate a moving window correlation surface in R. Here is an example, that includes a Kriging estimate, for calculating a moving window correlation surface.

# Moving window correlation function

# x raster of x

# y raster of y

# dist distance (radius) of correlation window, the default AUTO calculates a window size

# ... additional arguments passed to the cor function

mwcor <- function(x, y, dist="AUTO", ...) {

require(sp)

require(spdep)

if ( (dist == "AUTO") == TRUE){

cs <- x@grid@cellsize[1]

dist = sqrt(2*((cs*3)^2))

} else {

if (!is.numeric (dist)) stop("DISTANCE MUST BE NUMERIC")

}

nb <- dnearneigh(coordinates(x),0, dist)

v=sapply(nb, function(i) cor(x@data[i,], y@data[i,], ...))

if( (class(v)=="numeric") == TRUE) {

v = as.data.frame(v)

} else {

v = as.data.frame(t(v))

}

coordinates(v) = coordinates(x)

gridded(v) = TRUE

( v )

}

# Example

require(gstat)
require(sp)
require(spdep)

data(meuse)
data(meuse.grid)
coordinates(meuse) <- ~x + y
coordinates(meuse.grid) <- ~x + y

# GRID-1 log(copper):
v1 <- variogram(log(copper) ~ 1, meuse)
x1 <- fit.variogram(v1, vgm(1, "Sph", 800, 1))

G1 <- krige(zinc ~ 1, meuse, meuse.grid, x1, nmax = 30)

gridded(G1) <- TRUE

G1@data = as.data.frame(G1@data[,-2])

# GRID-2 log(lead):
v2 <- variogram(log(lead) ~ 1, meuse)
x2 <- fit.variogram(v2, vgm(.1, "Sph", 1000, .6))

G2 <- krige(zinc ~ 1, meuse, meuse.grid, x2, nmax = 30)

gridded(G2) <- TRUE

G2@data <- as.data.frame(G2@data[,-2])

# Moving window correlation surface

gcorr <- mwcor(G1, G2, 500, method="spearman")

# Plot results

colr=colorRampPalette(c("blue", "yellow", "red"))

spplot(gcorr, col.regions=colr(100))

DanielAmrine · ‎07-23-2014

Jeffery,

I just downloaded your paper and i'm looking forward to reading it!

I've spent the last six years working as a mapping and geoscience technician in the Marcellus field. However my company had to go through some cuts and now i'm working on projects all over the country in terms of exploration.

The difficulty comes in correlating the data on a one to one relationship rather than subtracting the means from X and Y, multiplying them together, summing them and then dividing by the product of the X and Y standard deviations. Then dividing the result by the total number of values and subtracting by 1. That is the core of the process I used I just didn't Sum them since there where only 2 values (an X and Y) for each well.

For the grids I tried leaving out the sum but obviously it didn't work. My goal is to derive a correlation coefficient for each well in terms of the correlated data sets.

My main goal is to see if there are any spatial relationships to the correlation of these different values. Essentially we want to find out what kind of data will determine the success of an oil and gas reservoir by using any means possible!

I also tried copying and pasting your code into R and, I'm such a noob, I have no idea on how to fix the code so it will run. I will work on that!

Thank you for taking the time to reply!

WaseemAli · ‎11-02-2014

I applied the same equation in raster calculator of arcgis but give me the error while running. I thing there are some abnormal values in it. How can i remove such abnormalities first and then i would apply this formula to derive the ndvi.
Also comment about the formula NDVI - NDVImin / NDVImax - NDVImax called Green vegetation fraction (GVF). Hu and Jia 2010 used this same equation in his study to derive the green vegetation fraction and gives the ranges 0.20 and 0.70 as minimum and maximum respectively that have been taken from AVHRR data for the same region http://onlinelibrary.wiley.com/enhanced/doi/10.1002/joc.1984 but how can i use these values for the area where AVHRR data is not available as in the case of mine.