# Correlation between two different rasters

3898
18
08-02-2017 10:27 AM Occasional Contributor III

Xander Bakker‌, Heartly thanks for the Trendline script. It will help you many of GIS users.  One more query since long I have,

Just a few months back I posted one query regarding  Correlation between two different rasters (Example: for X parameter 13 rasters and for Y parameter also the same numbers of raster) and output correlation would be raster format and spatial map of correlation raster will indicate the how two parameters are correlated, value of output -1 to +1.

I got the response from you, but that code performs based on randomly generated point data over the raster and its created the correlation graph.

As we know the spatial map of correlation can be easy and much better to understand and explain the relationship while we talking about the how X and Y correlated and variation over any region.

I searched it, and many of Python user has been asked but not come out with a proper solution yet.

Please, can you share your opinion, how it could be done in python?

Thanks & regards

Shouvik Jha

Tags (3)
1 Solution

Accepted Solutions by Esri Esteemed Contributor

I noticed that the data from both parameters do not align exactly. Cell sizes are not the same and therefore the extent is also different. It is important that they align exactly for the process to work correctly (that the correct pixels from both list of datasets are used in the correlation). vs and vs resulting in a shift: There is also something weird going on with the script. Although the raster properties indicate that there are 1994 lines, a raster with 1995 lines is created and since the the properties of the original raster are used to define the lower left corner for translating the numpy array back to a raster, this results in a shift in 1 pixel. I have manually corrected the shift and attached the resulting correlation raster for you to examine. The code used is list below:

``````#-------------------------------------------------------------------------------
# Name:        correlation_2_sets_of_rasters.py
# Purpose:
#
# Author:      Xander
#
# Created:     03-08-2017
#-------------------------------------------------------------------------------

def main():
import arcpy
arcpy.env.overwriteOutput = True

import numpy as np
template1 = r'C:\GeoNet\Correlation\Parameter1\r{0}_NPP.TIF'
template2 = r'C:\GeoNet\Correlation\Parameter2\r{0}_WUE.TIF'
nodata = -3.4028235e+38
out_ras = r'C:\GeoNet\Correlation\correlation.TIF'

print "create nested numpy array list..."
lst_np_ras = []
for i in range(1, 14):
ras_path1 = template1.format("%03d" % (i,))
print " - ", ras_path1
ras_np1 = arcpy.RasterToNumPyArray(ras_path1)
ras_path2 = template2.format("%03d" % (i,))
print " - ", ras_path2
ras_np2 = arcpy.RasterToNumPyArray(ras_path2)
lst_np_ras.append([ras_np1, ras_np2])

ras_np = lst_np_ras  # take first numpy array from list
rows = ras_np.shape
cols = ras_np.shape
print " - rows:", rows
print " - cols:", cols

print "create output numpy array..."
ras_path = template1.format("%03d" % (1,))
raster = arcpy.Raster(ras_path)
ras_np_res = np.ndarray((rows, cols))
print " - out rows:", ras_np_res.shape
print " - out cols:", ras_np_res.shape

print "loop through pixels..."
pix_cnt = 0
for row in range(rows):
for col in range(cols):
pix_cnt += 1
if pix_cnt % 5000 == 0:
print " - row:", row, "  col:", col, "  pixel:", pix_cnt
lst_vals1 = []
lst_vals2 = []
try:
for lst_pars in lst_np_ras:
lst_vals1.append(lst_pars[row, col])
lst_vals2.append(lst_pars[row, col])
lst_vals1 = ReplaceNoData(lst_vals1, nodata)
lst_vals2 = ReplaceNoData(lst_vals2, nodata)
# perform calculation on list
correlation = CalculateCorrelation(lst_vals1, lst_vals2, nodata)
ras_np_res[row, col] = correlation
except Exception as e:
print "ERR:", e
print " - row:", row, "  col:", col, "  pixel:", pix_cnt
print " - lst_vals1:", lst_vals1
print " - lst_vals2:", lst_vals2

pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) #  - raster.meanCellHeight
xcellsize = raster.meanCellWidth
ycellsize = raster.meanCellHeight

print "Write output raster..."
print " - ", out_ras
ras_res = arcpy.NumPyArrayToRaster(ras_np_res, lower_left_corner=pnt, x_cell_size=xcellsize,
y_cell_size=ycellsize, value_to_nodata=nodata)
ras_res.save(out_ras)
arcpy.DefineProjection_management(in_dataset=out_ras, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")

def CalculateCorrelation(a, b, nodata):
import numpy
try:
coef = numpy.corrcoef(a,b)
return coef
except:
return nodata

def ReplaceNoData(lst, nodata):
res = []
for a in lst:
if a < nodata / 2.0:
res.append(None)
else:
res.append(a)
return res

if __name__ == '__main__':
main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍``````

Kind regards, Xander

18 Replies by Esri Esteemed Contributor

Hi shouvik jha , are you referring to this thread: Scatter plot of two rasters  or is it another one, since I see several related threads?

If you want to compare each of the 13 rasters with the other rasters, this yields 78 possible combinations (if for instance comparing 13 to 10 is the same as 10 to 13). If you plot all of these in the same plot this will be very dense and probably impossible to interpret. Or do you want to generate a graph for each comparison?

You could use the numpy rasters to query the values and generate the lists for the plot, but there will be a lot of possibilities and I'm not sure is there might be memory issues or size restrictions for generating the plots, but I'm sure it must be possible. Do you want to do this for the 13 rasters you attached to this thread? Occasional Contributor III

Hi Xander Bakker‌, Yes i was talking about that thread only, but presently what I am talking is something different.

lets we have two parameters. one just assumes Temperature and precipitation raster for 13 years. For calculating correlation we need some data points from both parameters.

So this time we have two parameters, each parameter has 13 raster maps.one is WUE (Already we worked on that data) and another one vegetation density. So it will just calculate pixel basis correlation, value comes from 2 different. parameters over 13 rasters.

If we take single pixel from both parameters of 13 rasters, so value would be like

All value comes from single grid point. WUE rasterVegetation density raster
raster1200100
raster2250120
raster3300130
raster4350150
raster5250120
raster6150100
raster7120140
raster8130150
raster9 so on50100

based on this value correlation coefficient is 0.36. Calculated in Excel "CORREL" command.

I want similar calculation on the different grid basis and correlation value will be in raster format. Hope I explained properly. If you want some data, I can share with you.

thanks by Esri Esteemed Contributor

If I understand you correctly, you want to extract for each pixel, 26 values (13 values from the WUE rasters and 13 values from the vegetation density rasters), calculate the correlation and write back the correlation value in the output raster. Is that correct? Occasional Contributor III

Yes, exactly the same things. Occasional Contributor III

I am attaching my data, for your reference purpose. Occasional Contributor III

Hi Bakker‌ , i have attached the data link for the reference purpose. Kindly have a look.

Data link : Dropbox - Data.zip

Thanks & regards

Shouvik Jha by Esri Esteemed Contributor

Hi india123 , I branched this part to a new question and will download the data to see what is possible.

Kind regards, Xander by Esri Esteemed Contributor

I noticed that the data from both parameters do not align exactly. Cell sizes are not the same and therefore the extent is also different. It is important that they align exactly for the process to work correctly (that the correct pixels from both list of datasets are used in the correlation). vs and vs resulting in a shift: There is also something weird going on with the script. Although the raster properties indicate that there are 1994 lines, a raster with 1995 lines is created and since the the properties of the original raster are used to define the lower left corner for translating the numpy array back to a raster, this results in a shift in 1 pixel. I have manually corrected the shift and attached the resulting correlation raster for you to examine. The code used is list below:

``````#-------------------------------------------------------------------------------
# Name:        correlation_2_sets_of_rasters.py
# Purpose:
#
# Author:      Xander
#
# Created:     03-08-2017
#-------------------------------------------------------------------------------

def main():
import arcpy
arcpy.env.overwriteOutput = True

import numpy as np
template1 = r'C:\GeoNet\Correlation\Parameter1\r{0}_NPP.TIF'
template2 = r'C:\GeoNet\Correlation\Parameter2\r{0}_WUE.TIF'
nodata = -3.4028235e+38
out_ras = r'C:\GeoNet\Correlation\correlation.TIF'

print "create nested numpy array list..."
lst_np_ras = []
for i in range(1, 14):
ras_path1 = template1.format("%03d" % (i,))
print " - ", ras_path1
ras_np1 = arcpy.RasterToNumPyArray(ras_path1)
ras_path2 = template2.format("%03d" % (i,))
print " - ", ras_path2
ras_np2 = arcpy.RasterToNumPyArray(ras_path2)
lst_np_ras.append([ras_np1, ras_np2])

ras_np = lst_np_ras  # take first numpy array from list
rows = ras_np.shape
cols = ras_np.shape
print " - rows:", rows
print " - cols:", cols

print "create output numpy array..."
ras_path = template1.format("%03d" % (1,))
raster = arcpy.Raster(ras_path)
ras_np_res = np.ndarray((rows, cols))
print " - out rows:", ras_np_res.shape
print " - out cols:", ras_np_res.shape

print "loop through pixels..."
pix_cnt = 0
for row in range(rows):
for col in range(cols):
pix_cnt += 1
if pix_cnt % 5000 == 0:
print " - row:", row, "  col:", col, "  pixel:", pix_cnt
lst_vals1 = []
lst_vals2 = []
try:
for lst_pars in lst_np_ras:
lst_vals1.append(lst_pars[row, col])
lst_vals2.append(lst_pars[row, col])
lst_vals1 = ReplaceNoData(lst_vals1, nodata)
lst_vals2 = ReplaceNoData(lst_vals2, nodata)
# perform calculation on list
correlation = CalculateCorrelation(lst_vals1, lst_vals2, nodata)
ras_np_res[row, col] = correlation
except Exception as e:
print "ERR:", e
print " - row:", row, "  col:", col, "  pixel:", pix_cnt
print " - lst_vals1:", lst_vals1
print " - lst_vals2:", lst_vals2

pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) #  - raster.meanCellHeight
xcellsize = raster.meanCellWidth
ycellsize = raster.meanCellHeight

print "Write output raster..."
print " - ", out_ras
ras_res = arcpy.NumPyArrayToRaster(ras_np_res, lower_left_corner=pnt, x_cell_size=xcellsize,
y_cell_size=ycellsize, value_to_nodata=nodata)
ras_res.save(out_ras)
arcpy.DefineProjection_management(in_dataset=out_ras, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")

def CalculateCorrelation(a, b, nodata):
import numpy
try:
coef = numpy.corrcoef(a,b)
return coef
except:
return nodata

def ReplaceNoData(lst, nodata):
res = []
for a in lst:
if a < nodata / 2.0:
res.append(None)
else:
res.append(a)
return res

if __name__ == '__main__':
main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍``````

Kind regards, Xander Occasional Contributor III

Hi Xander Bakker‌, Thank you so much. Really a very wonderful script.

please, can you attach the output raster which you got.

I ran the script but the same issue with me also, it's producing one more row.

After reading your comments, I USE the clip function in ArcGIS to bring the same extent as the parameter1 have. But the same issue persisting.

Please tell me, how you manually correct that.

Once again thanks for the interesting and very useful script.

Thanks & regards

Shouvik Jha 