Dan_Patterson

Working with blocks of data...

Blog Post created by Dan_Patterson Champion on Sep 1, 2016

This post seemed interesting for several reasons:   Iterate and Add Values from many fields

  • it is generally the type of thing that we know how to do in a spreadsheet easily
  • it is a different thought process that one uses to translate into code
  • I got to experiment with block based functions that weren't related to raster or image data

 

Reflecting before problem solving

As a preamble, let's examine the year in different ways.

Shaping up your year in various ways....

the conventional year, by day

 

[[  1,   2,   3, ..., 363, 364, 365]] 

the year by average month

      [[  1,   2,   3, ...,  28,  29,  30], 

       [ 31,  32,  33, ...,  58,  59,  60],

       [ 61,  62,  63, ...,  88,  89,  90],

       ...,   ... snip ....

       [271, 272, 273, ..., 298, 299, 300],

       [301, 302, 303, ..., 328, 329, 330],

       [331, 332, 333, ..., 358, 359, 360]]

the year by weeks

      [[  1,   2,   3, ...,   5,   6,   7], 

       [  8,   9,  10, ...,  12,  13,  14],

       [ 15,  16,  17, ...,  19,  20,  21],

       ...,  ... snip ....

       [344, 345, 346, ..., 348, 349, 350],

       [351, 352, 353, ..., 355, 356, 357],

       [358, 359, 360, ..., 362, 363, 364]]

how about the average lunar cycle?

      [[  1,   2,   3, ...,  26,  27,  28],

       [ 29,  30,  31, ...,  54,  55,  56],

       [ 57,  58,  59, ...,  82,  83,  84],

       ...,

       [281, 282, 283, ..., 306, 307, 308],

       [309, 310, 311, ..., 334, 335, 336],

       [337, 338, 339, ..., 362, 363, 364]])

 

Your year can be shaped in different ways... but notice that the year is never always equal to 365 (not talking about leap years here).

The year is divisible by chunks... the fiddly-bits get truncated.  Notice the different year lengths in the above (365, 364, 360, 364).

Did you ever wonder why your birthday is on a different sequentially changing day every year?

 

On to the problem

So here is some data.  It is constructed so that you can do the 'math' yourself.  The human brain is pretty good at picking out the patterns...it is just less adept at formulating a plan to implement it.  So here goes..

 

The task is to take a set of data and determine some properties on a block by block basis.  In this example, the maximum needs to be determined for blocks of 4 values for each block in a row.  Coincidently ... (aka, to simplify the problem) this yields 5 blocks of 4 in each row.  This type of block is denoted as a 1x4 block... one row by 4 columns, if you wish.

 

1.  Input data.... subdivide the data into 1x4 blocks

[[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20]

[ 21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40]

[ 41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60]

[ 61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80]

[ 81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100]]

 

2.  Imagine that the data are split into 5 blocks of 1x4 values by 5 rows which means that you would have 25 1x4 blocks of values.   Here is an example [1, 2, 3,4].  Cleverly we can detect that the maximum as being 4 which most of you know how to derive in some programming language using some interface, whether it be the field calculator or a tool in arctoolbox.

 

3.  Using your minds eye, or a spreadsheet, determine the maximum for each block of 4, within each row, for all rows.

The maximum of each 1x4 block in each row, would yield.

[[  4   8  12  16  20]

[ 24  28  32  36  40]

[ 44  48  52  56  60]

[ 64  68  72  76  80]

[ 84  88  92  96 100]]

 

4.  The final stage of the cited problem was to determine the maximum of each of the 1x4 blocks within each row.  That is, the maximum of the values produced in step 2.

The results on a row basis is as follows:

 

[ 20  40  60  80 100]

 

: ---- Example 2 ----
5.  Let's do the sum of each 1x4 block and determine the row max

     Again, I won't show the intermediate step since it is visually confusing unless you are interested in it.

 

6. sums of each 1x4 block in each row

[[[[ 10  26  42  58  74]

  [[ [ 90 106 122 138 154]

  [[ [170 186 202 218 234]

  [[ [250 266 282 298 314]

  [[ [330 346 362 378 394]]

 

7. maximum of (6) by row        [ 74 154 234 314 394]

 

: ---- Example 3 ----

8.  Do the sum but with a masked array. The mask locations should

:    not be included in calculations

[[1 2 3 4 5 -- 7 8 9 10 11 -- 13 14 15 16 17 -- 19 20]

[[ [21 22 23 -- 25 26 27 28 29 -- 31 32 33 34 35 -- 37 38 39 40]

[[[41 -- 43 44 45 46 47 -- 49 50 51 52 53 -- 55 56 57 58 59 --]

[[ [61 62 63 64 65 -- 67 68 69 70 71 -- 73 74 75 76 77 -- 79 80]

[[[81 82 83 -- 85 86 87 88 89 -- 91 92 93 94 95 -- 97 98 99 100]]

 

9. now with masked values

 

10. sum of the 1x 4 blocks accounting for the mask

[[10 20 30 58 56]

[[ [66 106 92 102 154]

[[ [128 138 202 164 174]

[[ [250 200 210 298 236]

[[ [246 346 272 282 394]]

 

11. maximum of 10 by row      [58 154 202 298 394]

# -------------------------------------------------------------------------------------------------------------------------

Now, this can obviously be extended to look at longer data lists.  The following example looks at how to determine the maximum monthly value and maximum sum over a 5 year period.  To simplify the example and to facilitate mental math, a year has 360 days, and 1 month has 30 days.

---- the trick ----

a = int_range((5,360),1,1)  # rows, columns, start, step
b = block_reshape(a,block=(1,30))

 

:---- Example 4 ----

1.  Input data.... subdivide the data into 1x30 blocks... each 360 day year, by 30 days per month

[[   1    2    3         ...,  358  359  360]

[ 361  362  363    ...,  718  719  720]

[ 721  722  723    ..., 1078 1079 1080]

[1081 1082 1083 ..., 1438 1439 1440]

[1441 1442 1443 ..., 1798 1799 1800]]

 

2.  Now imagine what that would look like if subdivided

3.  maximum of each 1x30 block in each row

[[  30   60   90  120  150  180  210  240  270  300  330  360]

[ 390  420  450  480  510  540  570  600  630  660  690  720]

[ 750  780  810  840  870  900  930  960  990 1020 1050 1080]

[1110 1140 1170 1200 1230 1260 1290 1320 1350 1380 1410 1440]

[1470 1500 1530 1560 1590 1620 1650 1680 1710 1740 1770 1800]]

 

4.  maximum of (3) by row

[ 360  720 1080 1440 1800]

 

: ---- Example 5 ----

5.  Let's do the sum of each 1x30 block and determine the row max

6. sums of each 1x30 block in each row

[[  465  1365  2265  3165  4065  4965  5865  6765  7665  8565  9465 10365]

[11265 12165 13065 13965 14865 15765 16665 17565 18465 19365 20265 21165]

[22065 22965 23865 24765 25665 26565 27465 28365 29265 30165 31065 31965]

[32865 33765 34665 35565 36465 37365 38265 39165 40065 40965 41865 42765]

[43665 44565 45465 46365 47265 48165 49065 49965 50865 51765 52665 53565]]

7. maximum of (6) by row

[10365 21165 31965 42765 53565]

 

Summary:

Still not convinced?  Well what is the cumulative sum of 1 to 30 inclusive:

[  1,   3,   6,  10,  15,  21,  28,  36,  45,  55,  66,  78,  91, 105, 120, 136,
    153, 171, 190, 210, 231, 253, 276, 300, 325, 351, 378, 406, 435, 465 ]

How about the next 30 days:

[  31,   63,   96,  130,  165,  201,  238,  276,  315,  355,  396,  438,  481,  525,  570,

        616,  663,  711,  760,  810,  861,  913,  966, 1020, 1075, 1131, 1188, 1246, 1305, 1365]

Now notice the first 2 entries in (6) above... 465 and 1365

 

---- What is the magic??? -----

>>> a.shape   # 5 years, 360 days in a year

(5, 360)

>>> b.shape   # 5 years, 12 months, 30 days per 1 month

(5, 12, 1, 30)

>>> c.shape   # reshape by month

(5, 12)

>>> d.shape   # number of years

(5,)

>>> c1.shape  # reshape the 5 years by month and sum each month, take the max

(5, 12)

>>> d1.shape  # sums for the 5 years, take the yearly max

(5,)

>>>

 

And finally in code

 

let's make up 5 years of data where it rains 1mm per day, day in-day out
>>> # here we go
>>> x = np.ones((5, 360), dtype='int32')
>>> x = np.ones((5, 360), dtype='int32')
>>> x.shape
(5, 360)
>>> x.min(), x.max()
(1, 1)
>>> # so far so good
>>>
>>> x1 = block_reshape(x, block=(1, 30)) # produce the monthly blocks
>>> x2 = ((x1.T).max(axis=0))            # start reorganizing
>>> x2 = (x2.T).squeeze()                # more reorganizing
>>> x3 = x2.max(axis=1)                  # get the maximum per year
>>> #
>>> x4 =((x1.T).sum(axis=0))             # do the sums
>>> x4 = (x4.T).squeeze()                # finish reorganizing
>>> x5 = x4.max(axis=1)                  # and finish up with the maximums
>>>
>>> x   # the daily precipitation, for 360 days for 5 years
array([[1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1],
       [1, 1, 1, ..., 1, 1, 1]])
>>> x2  # The monthly maximum for 5 years (12 months for 5 years)
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
>>> x3  # the yearly summary for the maximums for 5 years
array([1, 1, 1, 1, 1])
>>> x4  # the monthly sums  12 months for 5 years
array([[30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30],
       [30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30],
       [30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30],
       [30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30],
       [30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30]])
>>> x5  # the annual montly maximum for each year
array([30, 30, 30, 30, 30])
>>>

 

That's all for now

Outcomes