Trend Analysis Through Time Series of Raster Data

11902
31
Jump to solution
07-27-2017 11:14 AM
ShouvikJha
Occasional Contributor III

Hi All, 

I have total 13 raster files for different years (2001 - 2013). All data are in tiff format and same spatial extent (row and column). I am trying to perform regression line slope/trend analysis between each grid points for 13 raster data sets. I have tried in ArcGIS raster calculator but I won't run the task due to complexity. 

The outcome would measure the net change between pixels through my time series data. 

 

So my question is, can it be performed using raster calculator or ArcPy Script?  For such type of statistical analysis using ArcGIS, really I am facing one of the big limitation with ArcGIS. 

 

Using excel, it can be calculated using SLOPE command where X (Time) and Y (value) has been used. But for creating a Spatial map of Slope/trend analysis, we need to perform using raster data.   

 

The download link is given below for Sample data sets (Due to size limit, I could not attach the data with here )

Data Link: Dropbox - Water_Use_Efficiency.zip 

Please share your experience. 

Thanks in advance. 

 

0 Kudos
31 Replies
XanderBakker
Esri Esteemed Contributor

Sorry Dan Patterson for not implementing a three dimensional ndarray as you suggested. I was not sure if this or my method could run into memory problems, but if the rasters are too big for this, there is a nice example of using blocks in the help page: RasterToNumPyArray—Help | ArcGIS Desktop (scroll down to example 2).

DanPatterson_Retired
MVP Emeritus

xander_bakker‌ blocks are good... what I would call strides with or without overlap... they can also reduce processing times if one is memory starved as well.  

DanPatterson_Retired
MVP Emeritus

Well I wouldn't even worry about memory blocks ... 300, 000, 000 values

# ---- 5000 x 5000 cell raster ----

a = np.random.randint(0, 10, size=(12, 5000, 5000))

np.mean(a, axis=0)  # ---- let's try 5k x 5k raster
Out[5]: 
array([[ 3.17,  5.50,  4.75, ...,  4.00,  4.33,  5.25],
       [ 5.25,  6.50,  6.00, ...,  4.42,  5.00,  3.42],
       [ 3.25,  5.33,  4.58, ...,  3.67,  4.83,  4.58],
       ..., 
       [ 4.17,  2.83,  3.67, ...,  2.58,  4.08,  5.42],
       [ 4.83,  4.17,  4.83, ...,  6.58,  3.50,  3.67],
       [ 5.42,  4.33,  5.75, ...,  4.67,  3.00,  2.92]])
‍‍‍‍‍‍‍‍‍‍‍‍‍

on the 8 Gb Surface Book... 

even double (float64) 10,000 x 10,000 x 12 months caused a minor murmur 

a = np.random.randint(0, 10, size=(12, 10000, 10000)) * 1.0

np.mean(a, axis=0)  # ---- let's try 10k x 10k raster
Out[12]: 
array([[ 5.00,  4.25,  5.17, ...,  3.67,  2.08,  5.08],
       [ 4.67,  5.50,  3.08, ...,  4.17,  3.50,  4.75],
       [ 3.83,  4.17,  4.00, ...,  3.50,  4.58,  4.42],
       ..., 
       [ 5.75,  5.00,  4.42, ...,  3.92,  5.25,  4.50],
       [ 4.17,  3.83,  4.50, ...,  5.17,  4.67,  4.50],
       [ 4.75,  4.00,  3.08, ...,  4.17,  5.33,  5.83]])

a.size
Out[13]: 1200000000  # 1,200,000,000 1.2 Billion values‍‍‍‍‍‍‍‍‍‍‍‍‍‍
ShouvikJha
Occasional Contributor III

Hi  Xander Bakker    Really a speechless gratitude for such wonderful work. And it is appreciated by our team. 

I ran the script, after that, i cross check the output pixel values with manual calculation in excel corresponding to the same pixel. The output values are not matching with manual pixel wise calculation. It is slightly different.

 Our kind request please have a look on the matter. I have attached my Excel calculation sheet, where both values are assigned, from manual and script output value.

Once again thanks for your such cooperation.

Thanks & regards 

Shouvik Jha 

XanderBakker
Esri Esteemed Contributor

You are right. Good catch! It seems there is something going wrong in the calculation. I just tried a manual list from pixel 1 into the function and a slope of -1.8379670329670326 (same as you Excel result) was returned. I will need to investigate what is going wrong here.

XanderBakker
Esri Esteemed Contributor

Hi shouvik jha , 

Apparently the number of lines of the output raster was 1 more than the input rasters and that caused a shift of 1 pixel. If have attached a corrected version of the output raster to the thread. I will look at the code to see what is wrong and attach the corrected code at a later moment. Could you revise the new raster?

Kind regards, Xander

ShouvikJha
Occasional Contributor III

Hi Xander Bakker‌, Thank you very much. really a wonderful work has been done by you.   

I cross checked the output and it matched exactly with pixel wise manual calculation.

0 Kudos
XanderBakker
Esri Esteemed Contributor

That is great news. I will get back with the corrected code as soon as I detect the error. 

XanderBakker
Esri Esteemed Contributor

Hi shouvik jha , 

To be able to debug a little faster I extracted a small part of the larger rasters and ran the script again. This time using 10.5 and not 10.3 which I have at home and there was no pixel shift. Or something changed between versions or something very weird is happening. 

See the code below which in this case includes additional lines to generate 3 other rasters too (intercept, total distance to regression line and mean distance to regression line)

#-------------------------------------------------------------------------------
# Name:        stats_numpy_series.py
# Purpose:
#
# Author:      Xander
#
# Created:     30-07-2017
#-------------------------------------------------------------------------------

def main():
    print "load modules..."
    import arcpy
    arcpy.env.overwriteOutput = True

    import numpy as np
    template = r'C:\GeoNet\WUE2\r{0}_WUE.TIF'
    nodata = -3.4028235e+38
    out_template = r'C:\GeoNet\WUE2\WUE2_{0}.TIF'
    out_names = ['slope', 'intersept', 'tot_dist', 'mean_dist']

    print "create numpy list..."
    lst_np_ras = []
    for i in range(1, 14):
        ras_path = template.format("%03d" % (i,))
        print " - ", ras_path
        ras_np = arcpy.RasterToNumPyArray(ras_path)
        lst_np_ras.append(ras_np)

    print "read props numpy raster..."
    ras_np = lst_np_ras[0]
    rows = ras_np.shape[0]
    cols = ras_np.shape[1]
    print " - rows:", rows
    print " - cols:", cols

    print "create output numpy array..."
    ras_path = template.format("%03d" % (1,))
    raster = arcpy.Raster(ras_path)
    lst_ras_np_res = []
    for out_name in out_names:
        print "out_name", out_name
        ras_np_res = np.ndarray((rows, cols))
        lst_ras_np_res.append(ras_np_res)
        print " - out rows:", ras_np_res.shape[0]
        print " - out cols:", ras_np_res.shape[1]

    print "loop through pixels..."
    pix_cnt = 0
    for row in range(rows):
        for col in range(cols):
            pix_cnt += 1
            if pix_cnt % 5000 == 0:
                print "row:", row, "  col:", col, "  pixel:", pix_cnt
            lst_vals = []
            for ras_np in lst_np_ras:
                lst_vals.append(ras_np[row, col])
            lst_vals = ReplaceNoData(lst_vals, nodata)
            # perform calculation on list
            a, b, sum_abs_dif, avg_abs_dif, cnt_elem = RegressionLine(lst_vals)
            if cnt_elem > 1:
                slope = a
                intersept = b
                lst_ras_np_res[0][row, col] = slope
                lst_ras_np_res[1][row, col] = intersept
                lst_ras_np_res[2][row, col] = sum_abs_dif
                lst_ras_np_res[3][row, col] = avg_abs_dif
            else:
                lst_ras_np_res[0][row, col] = nodata
                lst_ras_np_res[1][row, col] = nodata
                lst_ras_np_res[2][row, col] = nodata
                lst_ras_np_res[3][row, col] = nodata

    pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) #  - raster.meanCellHeight
    xcellsize = raster.meanCellWidth
    ycellsize = raster.meanCellHeight

    print "Write output rasters..."
    i = 0
    for out_name in out_names:
        out_ras = out_template.format(out_name)
        print " - ", out_ras
        ras_np_res = lst_ras_np_res[i]
        ras_res = arcpy.NumPyArrayToRaster(ras_np_res, lower_left_corner=pnt, x_cell_size=xcellsize,
                                 y_cell_size=ycellsize, value_to_nodata=nodata)

        ras_res.save(out_ras)
        arcpy.DefineProjection_management(in_dataset=out_ras, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")
        i += 1



def ReplaceNoData(lst, nodata):
    res = []
    for a in lst:
        if a < nodata / 2.0:
            res.append(None)
        else:
            res.append(a)
    return res


def RegressionLine(lst):
    from numpy import arange, array, ones, linalg
    months = range(1, 14)
    lst_x = []
    lst_y = []
    cnt = 0
    for a in lst:
        if not a is None:
            lst_x.append(months[cnt])
            lst_y.append(a)
        cnt += 1

    cnt_elem = len(lst_x)
    if cnt_elem >= 2:
        A = array([ lst_x, ones(len(lst_x))])
        # linearly generated sequence
        w = linalg.lstsq(A.T, lst_y)[0]

        # obtaining the parameters of the regression line
        a = w[0]  # slope
        b = w[1]  # offset

        # calculate acumulated abs difference from trend line
        zip_lst = zip(lst_x, lst_y)
        sum_abs_dif = 0
        for xy in zip_lst:
            x = xy[0]
            y = xy[1]
            calc = x * a + b
            abs_dif = abs(y - calc)
            sum_abs_dif += abs_dif

        avg_abs_dif = sum_abs_dif / cnt_elem
        return a, b, sum_abs_dif, avg_abs_dif, cnt_elem
    else:
        return None, None, None, None, None

if __name__ == '__main__':
    main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

See attached ZIP with the data used, script and resulting rasters.

ShouvikJha
Occasional Contributor III

Hi Xander Bakker‌ Thank you very much. Really this code helps us in many ways to do the statistical calculation. 

When I use the whole area that time also produced the same error (Increasing the row number) as earlier error, then i check it using another raster and I got the exact right result. Then I realized the Problem was not with the code, the problem was with data. After that, i just use the same data (WUE) but this time i used some tricky, First, i fill no data value as a Zero then extract the area of interest using my shape file, then I use the raster as an input of trend line analysis and got the exact result. 

What I realized, If the raster data contain any no data value that time the problem may occur.

And you and Dan_Patterson‌  grateful thanks to you people behalf of our team for your kind cooperation. 

0 Kudos