Hishouvik jha ,
Recently I needed to detect strange behavior in monthly consumption by water consumers. I created a data structure that had the 12 months of data for each client as attributes of a point file. The code below is a snippet that takes a list of data (this could be the 13 values for each pixel in your case) and calculates a regression line based on the data.
The slope will tell you if there is an increase or decrease in the range of data. In my case to detect this abnormal behavior I determined the average value, and the mean distance for each value to the regression line divided by mean value. The higher the value the less constant the consumption.
I think you could use this code (or parts) to create the numpy arrays for each raster and loop through the individual pixels (create a list of values for the 13 years you have for each pixel, do the calculation and fill an output array to write the result of the analysis to an output raster. I'm pretty sure there must be an easier way to do this, or even a standard way to calculate this on a time series of data. I will investigate a little more and let you know.
from numpy import arange,array,ones,linalg
def main():
lst_data = [[2.0, None, None, None, None, 1.0, 4.0, 6.0, 5.0, 7.0, 9.0, 9.0],
[None, None, None, None, None, None, None, None, None, None, 15.0, 16.0],
[14.0, 11.0, 25.0, 4.0, 4.0, 4.0, 3.0, 4.0, 2.0, 3.0, 4.0, 4.0],
[7.0, 9.0, 7.0, 3.0, 2.0, 2.0, None, 13.0, 5.0, 60.0, 25.0, 25.0],
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0],
[1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0],
[11.0, 12.0, 11.0, 12.0, 11.0, 12.0, 11.0, 12.0, 11.0, 12.0, 11.0, 12.0]]
for lst in lst_data:
print "\ndata:", lst
a, b, sum_abs_dif, avg_abs_dif = RegressionLine(lst)
if not a is None:
cons_prom = getAvg(lst)
print " - mean value:", cons_prom
print " - slope (a):", a
print " - intercept (b):", b
print " - total distance from regression line:", sum_abs_dif
print " - mean distance from regression line :", avg_abs_dif
print " - score:", avg_abs_dif / cons_prom
else:
print " - Insufficient data"
def RegressionLine(lst):
from numpy import arange, array, ones, linalg
months = range(1, 13)
lst_x = []
lst_y = []
cnt = 0
for a in lst:
if not a is None:
lst_x.append(months[cnt])
lst_y.append(a)
cnt += 1
cnt_elem = len(lst_x)
if cnt_elem >= 2:
A = array([ lst_x, ones(len(lst_x))])
w = linalg.lstsq(A.T, lst_y)[0]
a = w[0]
b = w[1]
zip_lst = zip(lst_x, lst_y)
sum_abs_dif = 0
for xy in zip_lst:
x = xy[0]
y = xy[1]
calc = x * a + b
abs_dif = abs(y - calc)
sum_abs_dif += abs_dif
avg_abs_dif = sum_abs_dif / cnt_elem
return a, b, sum_abs_dif, avg_abs_dif
else:
return None, None, None, None
def getAvg(lst):
corr_lst = [a for a in lst if not a is None]
cnt = len(corr_lst)
if cnt == 0:
return None
else:
return sum(corr_lst) / float(cnt)
if __name__ == '__main__':
main()
Results:
data: [2.0, None, None, None, None, 1.0, 4.0, 6.0, 5.0, 7.0, 9.0, 9.0]
- mean value: 5.375
- slope (a): 0.738095238095
- intercept (b): -0.529761904762
- total distance from regression line: 9.29761904762
- mean distance from regression line : 1.16220238095
- score: 0.216223698782
data: [None, None, None, None, None, None, None, None, None, None, 15.0, 16.0]
- mean value: 15.5
- slope (a): 1.0
- intercept (b): 4.0
- total distance from regression line: 3.5527136788e-15
- mean distance from regression line : 1.7763568394e-15
- score: 1.14603667058e-16
data: [14.0, 11.0, 25.0, 4.0, 4.0, 4.0, 3.0, 4.0, 2.0, 3.0, 4.0, 4.0]
- mean value: 6.83333333333
- slope (a): -1.18181818182
- intercept (b): 14.5151515152
- total distance from regression line: 42.303030303
- mean distance from regression line : 3.52525252525
- score: 0.515890613452
data: [7.0, 9.0, 7.0, 3.0, 2.0, 2.0, None, 13.0, 5.0, 60.0, 25.0, 25.0]
- mean value: 14.3636363636
- slope (a): 2.69171974522
- intercept (b): -3.0101910828
- total distance from regression line: 103.946496815
- mean distance from regression line : 9.44968152866
- score: 0.65788922035
data: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]
- mean value: 6.5
- slope (a): 1.0
- intercept (b): -1.46316339947e-15
- total distance from regression line: 3.17523785043e-14
- mean distance from regression line : 2.64603154202e-15
- score: 4.07081775696e-16
data: [1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0]
- mean value: 1.5
- slope (a): 0.020979020979
- intercept (b): 1.36363636364
- total distance from regression line: 5.87412587413
- mean distance from regression line : 0.48951048951
- score: 0.32634032634
data: [11.0, 12.0, 11.0, 12.0, 11.0, 12.0, 11.0, 12.0, 11.0, 12.0, 11.0, 12.0]
- mean value: 11.5
- slope (a): 0.020979020979
- intercept (b): 11.3636363636
- total distance from regression line: 5.87412587413
- mean distance from regression line : 0.48951048951
- score: 0.042566129522