Trend Analysis Through Time Series of Raster Data

ShouvikJha · ‎07-27-2017

Hi All,

I have total 13 raster files for different years (2001 - 2013). All data are in tiff format and same spatial extent (row and column). I am trying to perform regression line slope/trend analysis between each grid points for 13 raster data sets. I have tried in ArcGIS raster calculator but I won't run the task due to complexity.

The outcome would measure the net change between pixels through my time series data.

So my question is, can it be performed using raster calculator or ArcPy Script? For such type of statistical analysis using ArcGIS, really I am facing one of the big limitation with ArcGIS.

Using excel, it can be calculated using SLOPE command where X (Time) and Y (value) has been used. But for creating a Spatial map of Slope/trend analysis, we need to perform using raster data.

The download link is given below for Sample data sets (Due to size limit, I could not attach the data with here )

Data Link: Dropbox - Water_Use_Efficiency.zip

Please share your experience.

Thanks in advance.

XanderBakker · ‎08-02-2017

Thanks for sharing and I am glad that the code helped.

ShouvikJha · ‎08-21-2017

Hi Xander Bakker‌, Recently we got a very useful script from you regarding regression analysis.

Here, I would like to add one more parameter in this script which is P Value (calculate the significance). In the above script, we have been used the Numpy library numpy.linalg.lstsq which gives the output as Slope, intercept etc. However, P value is not calculated. P value also needed because it tells the how significant increasing and decreasing the trend while we interpret the output statistically in front of the scientific community.

So while reading the python library, I found the Scipy library scipy.stats.linregress and i just ran one script on my system and i got the slope, P value. Script is attached below

from numpy import arange,array,ones#,random,linalg
from scipy import stats

xi = arange(0,11)
A = array([ xi, ones(11)])
# linearly generated sequence
y = [154.96, 151.39, 143.87, 152.24, 142.97, 154.23, 141.15, 135.37, 151.50, 143.82, 143.77]

slope, intercept, r_value, p_value, std_err = stats.linregress(xi,y)

print 'slope', slope
print  'p_value', p_value
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

which gives the output as

Returns:
slope -0.899090909091
slope of the regression line
p-value : 0.142283969084
two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero.

So instead of Numpy, can we use here Numpy and Scipy both library to get the addition p value or in the existing script can we add this function to obtain the p Value as a raster.

Please guide us to solve this.

Thanks & regards

Shouvik Jha

XanderBakker · ‎08-21-2017

Hi shouvik jha , sure you can.

In the original code in the function RegressionLine on line 102, in the beginning there are two lists being created: lst_x and lst_y. You can use these two lists in the linregress function:

stats.linregress(lst_x, lst_y)

Return the p_value and create a raster from it.

ShouvikJha · ‎08-22-2017

Hi Xander Bakker‌, Thank you for the suggestions.

I tried the same, I just changed the function in line no 53 and 102 as using Numpy and Scipy both. but I getting the error.

Whatever line I have updated based on the source line which is marked by #

Error regarding 'tuple index out of range'. I have attached below updated code and Error massage. KIndly have a look

I just want to modify our earlier script to add one more parameter as p value raster. Earlier script we have calculated the parameters such ['slope', 'intersept', 'tot_dist', 'mean_dist']. I just want to remove the 'tot_dist', 'mean_dist' and it will replace by ['r_value', 'p_value', 'std_err'] remain will be the same. So the final output raster would be ['slope', 'intersept', 'r_value', 'p_value', 'std_err']

Updated script:

#-------------------------------------------------------------------------------
# Name:        Regression Analysis.py  
# Purpose:     Regression Analysis through time series data
# Author:      Xander Bakker
# Source:      Geonet
# Created:     30-07-2017
#-------------------------------------------------------------------------------


def main():
    print "load modules..."
    import arcpy
    from scipy import stats
    from numpy import arange,array,ones
    arcpy.env.overwriteOutput = True

    import numpy as np
    template = r'D:\NPP_GUJARAT_MONSOON\MODIS_ET\WUE4\r{0}_WUE.TIF'
    nodata = -3.4028235e+38
    out_template = r'D:\NPP_GUJARAT_MONSOON\MODIS_ET\WUE4\WUE_{0}.TIF'
    # out_names = ['slope', 'intersept', 'tot_dist', 'mean_dist']
    out_names = ['slope', 'intersept', 'r_value', 'p_value', 'std_err']

    print "create numpy list..."
    lst_np_ras = []
    for i in range(2, 13): # 2 number will start of raster name e.g. 2 = r002_WUE, 3 = r003_WUE
        ras_path = template.format("%03d" % (i,))
        print " - ", ras_path
        ras_np = arcpy.RasterToNumPyArray(ras_path)
        lst_np_ras.append(ras_np)

    print "read props numpy raster..."
    ras_np = lst_np_ras[0]
    rows = ras_np.shape[0]
    cols = ras_np.shape[1]
    print " - rows:", rows
    print " - cols:", cols

    print "create output numpy array..."
    ras_path = template.format("%03d" % (2,)) # 2 number will take the number of raster name e.g. 2 = r002_WUE, 3 = r003_WUE
    raster = arcpy.Raster(ras_path)
    lst_ras_np_res = []
    for out_name in out_names:
        print "out_name", out_name
        ras_np_res = np.ndarray((rows, cols))
        lst_ras_np_res.append(ras_np_res)
        print " - out rows:", ras_np_res.shape[0]
        print " - out cols:", ras_np_res.shape[1]

    print "loop through pixels..."
    pix_cnt = 0
    for row in range(rows):
        for col in range(cols):
            pix_cnt += 1
            if pix_cnt % 5000 == 0:
                print "row:", row, "  col:", col, "  pixel:", pix_cnt
            lst_vals = []
            for ras_np in lst_np_ras:
                lst_vals.append(ras_np[row, col])
            lst_vals = ReplaceNoData(lst_vals, nodata)
            # perform calculation on list
            # a, b, sum_abs_dif, avg_abs_dif, cnt_elem = RegressionLine(lst_vals)
            a, b, sum_abs_dif, avg_abs_dif, cnt_elem = stats.linregress(lst_vals)
            if cnt_elem > 1:
                slope = a
                intersept = b
                lst_ras_np_res[0][row, col] = slope
                lst_ras_np_res[1][row, col] = intersept
                lst_ras_np_res[2][row, col] = sum_abs_dif
                lst_ras_np_res[3][row, col] = avg_abs_dif
            else:
                lst_ras_np_res[0][row, col] = nodata
                lst_ras_np_res[1][row, col] = nodata
                lst_ras_np_res[2][row, col] = nodata
                lst_ras_np_res[3][row, col] = nodata

    pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) #  - raster.meanCellHeight
    xcellsize = raster.meanCellWidth
    ycellsize = raster.meanCellHeight

    print "Write output rasters..."
    i = 0
    for out_name in out_names:
        out_ras = out_template.format(out_name)
        print " - ", out_ras
        ras_np_res = lst_ras_np_res[i]
        ras_res = arcpy.NumPyArrayToRaster(ras_np_res, lower_left_corner=pnt, x_cell_size=xcellsize,
                                 y_cell_size=ycellsize, value_to_nodata=nodata)

        ras_res.save(out_ras)
        arcpy.DefineProjection_management(in_dataset=out_ras, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")
        i += 1



def ReplaceNoData(lst, nodata):
    res = []
    for a in lst:
        if a < nodata / 2.0:
            res.append(None)
        else:
            res.append(a)
    return res

# def ReplaceNoData(lst, nodata):
    stats.linregress(lst_x, lst_y)
    from numpy import arange, array, ones, linalg
    months = range(2, 13)
    lst_x = []
    lst_y = []
    cnt = 0
    for a in lst:
        if not a is None:
            lst_x.append(months[cnt])
            lst_y.append(a)
        cnt += 1

    cnt_elem = len(lst_x)
    if cnt_elem >= 2:
        A = array([ lst_x, ones(len(lst_x))])
        # linearly generated sequence
        # w = linalg.lstsq(A.T, lst_y)[0]
        w = stats.linregress(A.T, lst_y)[0]

        # obtaining the parameters of the regression line
        a = w[0]  # slope
        b = w[1]  # offset

        # calculate acumulated abs difference from trend line
        zip_lst = zip(lst_x, lst_y)
        sum_abs_dif = 0
        for xy in zip_lst:
            x = xy[0]
            y = xy[1]
            calc = x * a + b
            abs_dif = abs(y - calc)
            sum_abs_dif += abs_dif

        avg_abs_dif = sum_abs_dif / cnt_elem
        return a, b, sum_abs_dif, avg_abs_dif, cnt_elem
    else:
        return None, None, None, None, None

if __name__ == '__main__':
    main()‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Error message:

Traceback (most recent call last):
  File "<module1>", line 142, in <module>
  File "<module1>", line 61, in main
  File "C:\Python27\ArcGIS10.3\lib\site-packages\scipy\stats\stats.py", line 2997, in linregress
    elif x.shape[1] == 2:
IndexError: tuple index out of range‍‍‍‍‍‍

ShouvikJha · ‎08-26-2017

Hi xander_bakker‌, I am still stuck into this matter to modify the code based on SciPy.

My kind request please cooperate to solve the issue.

DanPatterson_Retired · ‎08-21-2017

Scipy can definitely be used, it has more stats functions but it is built on numpy for many of its functions. If you are concerned about the methods of calculating the regression coefficients, just examine the constraints, if any, Generally differences may exist only at the degrees of freedom level, but many of their stats tests have a ddof parameter (ie N vs N-1 on the denominator).

given that your p-value is what it is, you can treat your slope correlation as insignificant. A graph would help

ShouvikJha · ‎08-22-2017

Dan_Patterson‌ Thanks.

Yes, Scipy is more efficient to produce the statistical result and P value tells the how significant the trend. The significance of trend is more important while you interpret any output of trend or correlation. Everybody wants to know the How significance relation.

XanderBakker · ‎08-28-2017

Hi shouvik jha ,

I implemented the following changes to the script:

line 20 includes the additional name for the p_value raster
line 60 will also retrieve the p_value
line 68 and 74 were included to write the value in the numpy array for the p_value
line 107 imports scipy stats
line 123 implements the linear regression with scipy
line 140 (and 142) return the additional p_value. Please note that the other values are still generated with numpy, but you could return the values from scipy instead.

#-------------------------------------------------------------------------------
# Name:        stats_numpy_scipy_series.py
# Purpose:
#
# Author:      Xander
#
# Created:     30-07-2017
# Updated:     28-08-2017
#-------------------------------------------------------------------------------

def main():
    print "load modules..."
    import arcpy
    arcpy.env.overwriteOutput = True

    import numpy as np
    template = r'C:\GeoNet\WUE2\r{0}_WUE.TIF'
    nodata = -3.4028235e+38
    out_template = r'C:\GeoNet\WUE2\WUE2_{0}.TIF'
    out_names = ['slope', 'intersept', 'tot_dist', 'mean_dist', 'p_value']

    print "create numpy list..."
    lst_np_ras = []
    for i in range(1, 14):
        ras_path = template.format("%03d" % (i,))
        print " - ", ras_path
        ras_np = arcpy.RasterToNumPyArray(ras_path)
        lst_np_ras.append(ras_np)

    print "read props numpy raster..."
    ras_np = lst_np_ras[0]
    rows = ras_np.shape[0]
    cols = ras_np.shape[1]
    print " - rows:", rows
    print " - cols:", cols

    print "create output numpy array..."
    ras_path = template.format("%03d" % (1,))
    raster = arcpy.Raster(ras_path)
    lst_ras_np_res = []
    for out_name in out_names:
        print "out_name", out_name
        ras_np_res = np.ndarray((rows, cols))
        lst_ras_np_res.append(ras_np_res)
        print " - out rows:", ras_np_res.shape[0]
        print " - out cols:", ras_np_res.shape[1]

    print "loop through pixels..."
    pix_cnt = 0
    for row in range(rows):
        for col in range(cols):
            pix_cnt += 1
            if pix_cnt % 5000 == 0:
                print "row:", row, "  col:", col, "  pixel:", pix_cnt
            lst_vals = []
            for ras_np in lst_np_ras:
                lst_vals.append(ras_np[row, col])
            lst_vals = ReplaceNoData(lst_vals, nodata)
            # perform calculation on list
            a, b, sum_abs_dif, avg_abs_dif, cnt_elem, p_value = RegressionLine(lst_vals)
            if cnt_elem > 1:
                slope = a
                intersept = b
                lst_ras_np_res[0][row, col] = slope
                lst_ras_np_res[1][row, col] = intersept
                lst_ras_np_res[2][row, col] = sum_abs_dif
                lst_ras_np_res[3][row, col] = avg_abs_dif
                lst_ras_np_res[4][row, col] = p_value
            else:
                lst_ras_np_res[0][row, col] = nodata
                lst_ras_np_res[1][row, col] = nodata
                lst_ras_np_res[2][row, col] = nodata
                lst_ras_np_res[3][row, col] = nodata
                lst_ras_np_res[4][row, col] = nodata

    pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) #  - raster.meanCellHeight
    xcellsize = raster.meanCellWidth
    ycellsize = raster.meanCellHeight

    print "Write output rasters..."
    i = 0
    for out_name in out_names:
        out_ras = out_template.format(out_name)
        print " - ", out_ras
        ras_np_res = lst_ras_np_res[i]
        ras_res = arcpy.NumPyArrayToRaster(ras_np_res, lower_left_corner=pnt, x_cell_size=xcellsize,
                                 y_cell_size=ycellsize, value_to_nodata=nodata)

        ras_res.save(out_ras)
        arcpy.DefineProjection_management(in_dataset=out_ras, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")
        i += 1



def ReplaceNoData(lst, nodata):
    res = []
    for a in lst:
        if a < nodata / 2.0:
            res.append(None)
        else:
            res.append(a)
    return res


def RegressionLine(lst):
    from numpy import arange, array, ones, linalg
    from scipy import stats
    months = range(1, 14)
    lst_x = []
    lst_y = []
    cnt = 0
    for a in lst:
        if not a is None:
            lst_x.append(months[cnt])
            lst_y.append(a)
        cnt += 1

    cnt_elem = len(lst_x)
    if cnt_elem >= 2:
        A = array([ lst_x, ones(len(lst_x))])
        # linearly generated sequence
        w = linalg.lstsq(A.T, lst_y)[0]
        slope, intercept, r_value, p_value, std_err = stats.linregress(lst_x, lst_y)

        # obtaining the parameters of the regression line
        a = w[0]  # slope
        b = w[1]  # offset

        # calculate acumulated abs difference from trend line
        zip_lst = zip(lst_x, lst_y)
        sum_abs_dif = 0
        for xy in zip_lst:
            x = xy[0]
            y = xy[1]
            calc = x * a + b
            abs_dif = abs(y - calc)
            sum_abs_dif += abs_dif

        avg_abs_dif = sum_abs_dif / cnt_elem
        return a, b, sum_abs_dif, avg_abs_dif, cnt_elem, p_value
    else:
        return None, None, None, None, None, None

if __name__ == '__main__':
    main()‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 Please be aware that I did not test the code...

ShouvikJha · ‎08-29-2017

Hi Xander Bakker‌, many thanks for your great cooperation.

I ran the script. The script is running perfectly and results also correct.

Once again thank you very much for your such great contribution.

Thanks & regards

Shouvik Jha

XanderBakker · ‎08-29-2017

You are welcome. Great to hear that the code is working!