Hi All,
I have total 13 raster files for different years (2001 - 2013). All data are in tiff format and same spatial extent (row and column). I am trying to perform regression line slope/trend analysis between each grid points for 13 raster data sets. I have tried in ArcGIS raster calculator but I won't run the task due to complexity.
The outcome would measure the net change between pixels through my time series data.
So my question is, can it be performed using raster calculator or ArcPy Script? For such type of statistical analysis using ArcGIS, really I am facing one of the big limitation with ArcGIS.
Using excel, it can be calculated using SLOPE command where X (Time) and Y (value) has been used. But for creating a Spatial map of Slope/trend analysis, we need to perform using raster data.
The download link is given below for Sample data sets (Due to size limit, I could not attach the data with here )
Data Link: Dropbox - Water_Use_Efficiency.zip
Please share your experience.
Thanks in advance.
Solved! Go to Solution.
Sorry Dan Patterson for not implementing a three dimensional ndarray as you suggested. I was not sure if this or my method could run into memory problems, but if the rasters are too big for this, there is a nice example of using blocks in the help page: RasterToNumPyArray—Help | ArcGIS Desktop (scroll down to example 2).
xander_bakker blocks are good... what I would call strides with or without overlap... they can also reduce processing times if one is memory starved as well.
Well I wouldn't even worry about memory blocks ... 300, 000, 000 values
# ---- 5000 x 5000 cell raster ----
a = np.random.randint(0, 10, size=(12, 5000, 5000))
np.mean(a, axis=0) # ---- let's try 5k x 5k raster
Out[5]:
array([[ 3.17, 5.50, 4.75, ..., 4.00, 4.33, 5.25],
[ 5.25, 6.50, 6.00, ..., 4.42, 5.00, 3.42],
[ 3.25, 5.33, 4.58, ..., 3.67, 4.83, 4.58],
...,
[ 4.17, 2.83, 3.67, ..., 2.58, 4.08, 5.42],
[ 4.83, 4.17, 4.83, ..., 6.58, 3.50, 3.67],
[ 5.42, 4.33, 5.75, ..., 4.67, 3.00, 2.92]])
on the 8 Gb Surface Book...
even double (float64) 10,000 x 10,000 x 12 months caused a minor murmur
a = np.random.randint(0, 10, size=(12, 10000, 10000)) * 1.0
np.mean(a, axis=0) # ---- let's try 10k x 10k raster
Out[12]:
array([[ 5.00, 4.25, 5.17, ..., 3.67, 2.08, 5.08],
[ 4.67, 5.50, 3.08, ..., 4.17, 3.50, 4.75],
[ 3.83, 4.17, 4.00, ..., 3.50, 4.58, 4.42],
...,
[ 5.75, 5.00, 4.42, ..., 3.92, 5.25, 4.50],
[ 4.17, 3.83, 4.50, ..., 5.17, 4.67, 4.50],
[ 4.75, 4.00, 3.08, ..., 4.17, 5.33, 5.83]])
a.size
Out[13]: 1200000000 # 1,200,000,000 1.2 Billion values
Hi Xander Bakker Really a speechless gratitude for such wonderful work. And it is appreciated by our team.
I ran the script, after that, i cross check the output pixel values with manual calculation in excel corresponding to the same pixel. The output values are not matching with manual pixel wise calculation. It is slightly different.
Our kind request please have a look on the matter. I have attached my Excel calculation sheet, where both values are assigned, from manual and script output value.
Once again thanks for your such cooperation.
Thanks & regards
Shouvik Jha
You are right. Good catch! It seems there is something going wrong in the calculation. I just tried a manual list from pixel 1 into the function and a slope of -1.8379670329670326 (same as you Excel result) was returned. I will need to investigate what is going wrong here.
Hi shouvik jha ,
Apparently the number of lines of the output raster was 1 more than the input rasters and that caused a shift of 1 pixel. If have attached a corrected version of the output raster to the thread. I will look at the code to see what is wrong and attach the corrected code at a later moment. Could you revise the new raster?
Kind regards, Xander
Hi Xander Bakker, Thank you very much. really a wonderful work has been done by you.
I cross checked the output and it matched exactly with pixel wise manual calculation.
That is great news. I will get back with the corrected code as soon as I detect the error.
Hi shouvik jha ,
To be able to debug a little faster I extracted a small part of the larger rasters and ran the script again. This time using 10.5 and not 10.3 which I have at home and there was no pixel shift. Or something changed between versions or something very weird is happening.
See the code below which in this case includes additional lines to generate 3 other rasters too (intercept, total distance to regression line and mean distance to regression line)
#-------------------------------------------------------------------------------
# Name: stats_numpy_series.py
# Purpose:
#
# Author: Xander
#
# Created: 30-07-2017
#-------------------------------------------------------------------------------
def main():
print "load modules..."
import arcpy
arcpy.env.overwriteOutput = True
import numpy as np
template = r'C:\GeoNet\WUE2\r{0}_WUE.TIF'
nodata = -3.4028235e+38
out_template = r'C:\GeoNet\WUE2\WUE2_{0}.TIF'
out_names = ['slope', 'intersept', 'tot_dist', 'mean_dist']
print "create numpy list..."
lst_np_ras = []
for i in range(1, 14):
ras_path = template.format("%03d" % (i,))
print " - ", ras_path
ras_np = arcpy.RasterToNumPyArray(ras_path)
lst_np_ras.append(ras_np)
print "read props numpy raster..."
ras_np = lst_np_ras[0]
rows = ras_np.shape[0]
cols = ras_np.shape[1]
print " - rows:", rows
print " - cols:", cols
print "create output numpy array..."
ras_path = template.format("%03d" % (1,))
raster = arcpy.Raster(ras_path)
lst_ras_np_res = []
for out_name in out_names:
print "out_name", out_name
ras_np_res = np.ndarray((rows, cols))
lst_ras_np_res.append(ras_np_res)
print " - out rows:", ras_np_res.shape[0]
print " - out cols:", ras_np_res.shape[1]
print "loop through pixels..."
pix_cnt = 0
for row in range(rows):
for col in range(cols):
pix_cnt += 1
if pix_cnt % 5000 == 0:
print "row:", row, " col:", col, " pixel:", pix_cnt
lst_vals = []
for ras_np in lst_np_ras:
lst_vals.append(ras_np[row, col])
lst_vals = ReplaceNoData(lst_vals, nodata)
# perform calculation on list
a, b, sum_abs_dif, avg_abs_dif, cnt_elem = RegressionLine(lst_vals)
if cnt_elem > 1:
slope = a
intersept = b
lst_ras_np_res[0][row, col] = slope
lst_ras_np_res[1][row, col] = intersept
lst_ras_np_res[2][row, col] = sum_abs_dif
lst_ras_np_res[3][row, col] = avg_abs_dif
else:
lst_ras_np_res[0][row, col] = nodata
lst_ras_np_res[1][row, col] = nodata
lst_ras_np_res[2][row, col] = nodata
lst_ras_np_res[3][row, col] = nodata
pnt = arcpy.Point(raster.extent.XMin, raster.extent.YMin) # - raster.meanCellHeight
xcellsize = raster.meanCellWidth
ycellsize = raster.meanCellHeight
print "Write output rasters..."
i = 0
for out_name in out_names:
out_ras = out_template.format(out_name)
print " - ", out_ras
ras_np_res = lst_ras_np_res[i]
ras_res = arcpy.NumPyArrayToRaster(ras_np_res, lower_left_corner=pnt, x_cell_size=xcellsize,
y_cell_size=ycellsize, value_to_nodata=nodata)
ras_res.save(out_ras)
arcpy.DefineProjection_management(in_dataset=out_ras, coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")
i += 1
def ReplaceNoData(lst, nodata):
res = []
for a in lst:
if a < nodata / 2.0:
res.append(None)
else:
res.append(a)
return res
def RegressionLine(lst):
from numpy import arange, array, ones, linalg
months = range(1, 14)
lst_x = []
lst_y = []
cnt = 0
for a in lst:
if not a is None:
lst_x.append(months[cnt])
lst_y.append(a)
cnt += 1
cnt_elem = len(lst_x)
if cnt_elem >= 2:
A = array([ lst_x, ones(len(lst_x))])
# linearly generated sequence
w = linalg.lstsq(A.T, lst_y)[0]
# obtaining the parameters of the regression line
a = w[0] # slope
b = w[1] # offset
# calculate acumulated abs difference from trend line
zip_lst = zip(lst_x, lst_y)
sum_abs_dif = 0
for xy in zip_lst:
x = xy[0]
y = xy[1]
calc = x * a + b
abs_dif = abs(y - calc)
sum_abs_dif += abs_dif
avg_abs_dif = sum_abs_dif / cnt_elem
return a, b, sum_abs_dif, avg_abs_dif, cnt_elem
else:
return None, None, None, None, None
if __name__ == '__main__':
main()
See attached ZIP with the data used, script and resulting rasters.
Hi Xander Bakker Thank you very much. Really this code helps us in many ways to do the statistical calculation.
When I use the whole area that time also produced the same error (Increasing the row number) as earlier error, then i check it using another raster and I got the exact right result. Then I realized the Problem was not with the code, the problem was with data. After that, i just use the same data (WUE) but this time i used some tricky, First, i fill no data value as a Zero then extract the area of interest using my shape file, then I use the raster as an input of trend line analysis and got the exact result.
What I realized, If the raster data contain any no data value that time the problem may occur.
And you and Dan_Patterson grateful thanks to you people behalf of our team for your kind cooperation.