Calculate Correlation and Pearson between two sets of rasters

19606
11
Jump to solution
08-21-2017 11:33 AM
ShouvikJha
Occasional Contributor III

Branched from: https://community.esri.com/thread/199323-re-trend-analysis-through-time-series-of-raster-data 

Hi Xander Bakker‌,  Some modification I want to do in correlation script. 

I Would like to calculate the P value as a raster in addition to correlation raster. 

Using scipy library, I just ran one script to obtain the correlation and P value. Script is attached below

import numpy as np
from scipy.stats import pearsonr
x = np.array([ 58295.62187335,  45420.95483714,   3398.64920064,    977.22166306, 5515.32801851,  14184.57621022,  16027.2803392 ,  15313.01865824, 6443.2448182 ])
y = np.array([ 143547.79123381,   22996.69597427,    2591.56411049, 661.93115277,    8826.96549102,   17735.13549851, 11629.13003263,   14438.33177173,    6997.89334741])

r,p = pearsonr(x,y)

print "r", r

print "p", p

‍‍‍‍‍‍‍‍‍‍‍

Output 

Returns:

r : 0.831087956392

Pearson’s correlation coefficient

p-value : 0.00550539621039

2-tailed p-value

So instead of Numpy, can we use here Numpy and Scipy both library to get the addition p value or in the existing  script can we add this function to obtain the p Value as a raster

Thanks & regards 

Shouvik Jha 

0 Kudos
1 Solution

Accepted Solutions
XanderBakker
Esri Esteemed Contributor

Hi shouvik jha , I branched this question to give it more visibility and to avoid letting the post grow too much and contain multiple questions. 

Basically, what I would do is define a second numpy that will hold the pearson value. This is what it would look like:

  • line 11 defined the pearson output raster path and name
  • line 35 creates the pearson numpy array
  • line 57 and 58 will call the pearson function and store the result in the numpy array
  • line 76 to 80 will store the output pearson raster 
  • line 103 to 112 contains the function that calculated the pearson value

def main():
    print "load modules..."
    import arcpy
    arcpy.env.overwriteOutput = True

    import numpy as np
    template1 = r'C:\GeoNet\Correlation\Parameter1\r{0}_NPP.TIF'
    template2 = r'C:\GeoNet\Correlation\Parameter2\r{0}_WUE.TIF'
    nodata = -3.4028235e+38
    out_ras_corr = r'C:\GeoNet\Correlation\Pearson\correlation.TIF'
    out_ras_pear = r'C:\GeoNet\Correlation\Pearson\pearson.TIF'

    print "create nested numpy array list..."
    lst_np_ras = []
    for i in range(1, 14):
        ras_path1 = template1.format("%03d" % (i,))
        print " - ", ras_path1
        ras_np1 = arcpy.RasterToNumPyArray(ras_path1)
        ras_path2 = template2.format("%03d" % (i,))
        print " - ", ras_path2
        ras_np2 = arcpy.RasterToNumPyArray(ras_path2)
        lst_np_ras.append([ras_np1, ras_np2])

    print "read props numpy raster..."
    ras_np = lst_np_ras[0][0]  # take first numpy array from list
    rows = ras_np.shape[0]
    cols = ras_np.shape[1]
    print " - rows:", rows
    print " - cols:", cols

    print "create output numpy array..."
    ras_path = template1.format("%03d" % (1,))
    raster = arcpy.Raster(ras_path)
    ras_np_res_corr = np.ndarray((rows, cols))
    ras_np_res_pear = np.ndarray((rows, cols))
    print " - out rows:", ras_np_res_corr.shape[0]
    print " - out cols:", ras_np_res_corr.shape[1]

    print "loop through pixels..."
    pix_cnt = 0
    for row in range(rows):
        for col in range(cols):
            pix_cnt += 1
            if pix_cnt % 5000 == 0:
                print " - row:", row, "  col:", col, "  pixel:", pix_cnt
            lst_vals1 = []
            lst_vals2 = []
            try:
                for lst_pars in lst_np_ras:
                    lst_vals1.append(lst_pars[0][row, col])
                    lst_vals2.append(lst_pars[1][row, col])
                lst_vals1 = ReplaceNoData(lst_vals1, nodata)
                lst_vals2 = ReplaceNoData(lst_vals2, nodata)
                # perform calculation on list
                correlation = CalculateCorrelation(lst_vals1, lst_vals2, nodata)
                ras_np_res_corr[row, col] = correlation
                pearson = CalculatePearsons(lst_vals1, lst_vals2, nodata)
                ras_np_res_pear[row, col] = pearson
            except Exception as e:
                print "ERR:", e
                print " - row:", row, "  col:", col, "  pixel:", pix_cnt
                print " - lst_vals1:", lst_vals1
                print " - lst_vals2:", lst_vals2

    pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) #  - raster.meanCellHeight
    xcellsize = raster.meanCellWidth
    ycellsize = raster.meanCellHeight

    print "Write output rasters..."
    print " - ", out_ras_corr
    ras_res_corr = arcpy.NumPyArrayToRaster(ras_np_res_corr, lower_left_corner=pnt, x_cell_size=xcellsize,
                                 y_cell_size=ycellsize, value_to_nodata=nodata)
    ras_res_corr.save(out_ras_corr)
    arcpy.DefineProjection_management(in_dataset=out_ras_corr, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")

    print " - ", out_ras_pear
    ras_res_pear = arcpy.NumPyArrayToRaster(ras_np_res_pear, lower_left_corner=pnt, x_cell_size=xcellsize,
                                 y_cell_size=ycellsize, value_to_nodata=nodata)
    ras_res_pear.save(out_ras_pear)
    arcpy.DefineProjection_management(in_dataset=out_ras_pear, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")



def CalculateCorrelation(a, b, nodata):
    import numpy
    try:
        coef = numpy.corrcoef(a,b)
        return coef[0][1]
    except:
        return nodata


def ReplaceNoData(lst, nodata):
    res = []
    for a in lst:
        if a < nodata / 2.0:
            res.append(None)
        else:
            res.append(a)
    return res


def CalculatePearsons(a, b, nodata):
    import numpy
    from scipy.stats import pearsonr
    try:
        x = np.array(a)
        y = np.array(b)
        r, p = pearsonr(a, y)
        return p
    except:
        return nodata


if __name__ == '__main__':
    main()

At this moment I am getting a "RuntimeError: ERROR 999998: Unexpected Error" on line 73 when saving the output correlation raster. I will look into that and see what the problem is. You could try and run the code to see if it works on your system.

View solution in original post

11 Replies
XanderBakker
Esri Esteemed Contributor

Hi shouvik jha , just like the original code has the function CalculateCorrelation:

def CalculateCorrelation(a, b, nodata):
    import numpy
    try:
        coef = numpy.corrcoef(a,b)
        return coef[0][1]
    except:
        return nodata

... you could implement one that calculates p:

def CalculatePearsons(a, b, nodata):
    import numpy
    from scipy.stats import pearsonr
    try:
        x = np.array(a)
        y = np.array(b)
        r, p = pearsonr(a, y)
        return p
    except:
        return nodata

Include the function and on line 62 change:

correlation = CalculateCorrelation(lst_vals1, lst_vals2, nodata)

by (although the "correlation" will be the p value) :

correlation = CalculatePearsons(lst_vals1, lst_vals2, nodata)

I did not try the code though.

ShouvikJha
Occasional Contributor III

Hi Xander Bakker‌, Thank you for the suggestion. 

I trying to modify the code as you suggested but i don't understand, where I have to call the p Value raster. 

From our earlier script, We got the output only Correlation raster, but this time I want Correlation raster as well as P value raster. 

Snap from our earlier script on Input and output:

 

import numpy as np
    template1 = r'D:\Correlation\Parameter1\r{0}_NPP.TIF'
    template2 = r'D:\Correlation\Parameter2\r{0}_WUE.TIF'
    nodata = -3.4028235e+38
    out_ras_Corr = r'D:\Correlation\correlation.TIF' # Already exist in our script
    out_ras_p    = r'D:\Correlation\Significance.TIF' # this one i want to add 
    ‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
ShouvikJha
Occasional Contributor III

Hi Xander Bakker‌, I am facing same problem here also. 

we eagerly await your response

Thanks & regards 

Shouvik Jha 

0 Kudos
XanderBakker
Esri Esteemed Contributor

Hi shouvik jha , I branched this question to give it more visibility and to avoid letting the post grow too much and contain multiple questions. 

Basically, what I would do is define a second numpy that will hold the pearson value. This is what it would look like:

  • line 11 defined the pearson output raster path and name
  • line 35 creates the pearson numpy array
  • line 57 and 58 will call the pearson function and store the result in the numpy array
  • line 76 to 80 will store the output pearson raster 
  • line 103 to 112 contains the function that calculated the pearson value

def main():
    print "load modules..."
    import arcpy
    arcpy.env.overwriteOutput = True

    import numpy as np
    template1 = r'C:\GeoNet\Correlation\Parameter1\r{0}_NPP.TIF'
    template2 = r'C:\GeoNet\Correlation\Parameter2\r{0}_WUE.TIF'
    nodata = -3.4028235e+38
    out_ras_corr = r'C:\GeoNet\Correlation\Pearson\correlation.TIF'
    out_ras_pear = r'C:\GeoNet\Correlation\Pearson\pearson.TIF'

    print "create nested numpy array list..."
    lst_np_ras = []
    for i in range(1, 14):
        ras_path1 = template1.format("%03d" % (i,))
        print " - ", ras_path1
        ras_np1 = arcpy.RasterToNumPyArray(ras_path1)
        ras_path2 = template2.format("%03d" % (i,))
        print " - ", ras_path2
        ras_np2 = arcpy.RasterToNumPyArray(ras_path2)
        lst_np_ras.append([ras_np1, ras_np2])

    print "read props numpy raster..."
    ras_np = lst_np_ras[0][0]  # take first numpy array from list
    rows = ras_np.shape[0]
    cols = ras_np.shape[1]
    print " - rows:", rows
    print " - cols:", cols

    print "create output numpy array..."
    ras_path = template1.format("%03d" % (1,))
    raster = arcpy.Raster(ras_path)
    ras_np_res_corr = np.ndarray((rows, cols))
    ras_np_res_pear = np.ndarray((rows, cols))
    print " - out rows:", ras_np_res_corr.shape[0]
    print " - out cols:", ras_np_res_corr.shape[1]

    print "loop through pixels..."
    pix_cnt = 0
    for row in range(rows):
        for col in range(cols):
            pix_cnt += 1
            if pix_cnt % 5000 == 0:
                print " - row:", row, "  col:", col, "  pixel:", pix_cnt
            lst_vals1 = []
            lst_vals2 = []
            try:
                for lst_pars in lst_np_ras:
                    lst_vals1.append(lst_pars[0][row, col])
                    lst_vals2.append(lst_pars[1][row, col])
                lst_vals1 = ReplaceNoData(lst_vals1, nodata)
                lst_vals2 = ReplaceNoData(lst_vals2, nodata)
                # perform calculation on list
                correlation = CalculateCorrelation(lst_vals1, lst_vals2, nodata)
                ras_np_res_corr[row, col] = correlation
                pearson = CalculatePearsons(lst_vals1, lst_vals2, nodata)
                ras_np_res_pear[row, col] = pearson
            except Exception as e:
                print "ERR:", e
                print " - row:", row, "  col:", col, "  pixel:", pix_cnt
                print " - lst_vals1:", lst_vals1
                print " - lst_vals2:", lst_vals2

    pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) #  - raster.meanCellHeight
    xcellsize = raster.meanCellWidth
    ycellsize = raster.meanCellHeight

    print "Write output rasters..."
    print " - ", out_ras_corr
    ras_res_corr = arcpy.NumPyArrayToRaster(ras_np_res_corr, lower_left_corner=pnt, x_cell_size=xcellsize,
                                 y_cell_size=ycellsize, value_to_nodata=nodata)
    ras_res_corr.save(out_ras_corr)
    arcpy.DefineProjection_management(in_dataset=out_ras_corr, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")

    print " - ", out_ras_pear
    ras_res_pear = arcpy.NumPyArrayToRaster(ras_np_res_pear, lower_left_corner=pnt, x_cell_size=xcellsize,
                                 y_cell_size=ycellsize, value_to_nodata=nodata)
    ras_res_pear.save(out_ras_pear)
    arcpy.DefineProjection_management(in_dataset=out_ras_pear, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")



def CalculateCorrelation(a, b, nodata):
    import numpy
    try:
        coef = numpy.corrcoef(a,b)
        return coef[0][1]
    except:
        return nodata


def ReplaceNoData(lst, nodata):
    res = []
    for a in lst:
        if a < nodata / 2.0:
            res.append(None)
        else:
            res.append(a)
    return res


def CalculatePearsons(a, b, nodata):
    import numpy
    from scipy.stats import pearsonr
    try:
        x = np.array(a)
        y = np.array(b)
        r, p = pearsonr(a, y)
        return p
    except:
        return nodata


if __name__ == '__main__':
    main()

At this moment I am getting a "RuntimeError: ERROR 999998: Unexpected Error" on line 73 when saving the output correlation raster. I will look into that and see what the problem is. You could try and run the code to see if it works on your system.

ShouvikJha
Occasional Contributor III

Hi Xander Bakker Thank you very much for such wonderful script.

However I am running the script on my system and will inform you shortly.  

0 Kudos
XanderBakker
Esri Esteemed Contributor

Í just noted an error: line 104 should be:

import numpy as np
0 Kudos
XanderBakker
Esri Esteemed Contributor

I also changed lines 73 and 79 into :

    arcpy.CopyRaster_management(ras_res_corr, out_ras_corr)

and 

    arcpy.CopyRaster_management(ras_res_pear, out_ras_pear)

On a small subset it worked without problems. I did not check the resulting Pearson value though...

0 Kudos
ShouvikJha
Occasional Contributor III

Hi xander_bakker‌, Thank you very much for your such great cooperation. We are really very thankful for such wonderful and useful script from you. 

The updated script is running perfectly and produced the exact result. 

Thanks & regards 

Shouvik Jha 

XanderBakker
Esri Esteemed Contributor

Glad to hear that the code worked. Could you mark the question answered?

Kind regards, Xander