Dan_Patterson

Patterns, sequences, occurrence and position

Blog Post created by Dan_Patterson Champion on May 24, 2018

Patterns....

Wonder where the .... it has rained for 10 days straight, the longest stretch since 1954 ....  Do you picture some poor sod flipping through pages or reeling through spreadsheets.  Unlikely.  Many questions have to deal with 'sequences' or 'patterns' in the data.  

 

I have put together a toolset and one of the tools in there allows you to identify a sequence and identify the value, the start and end locations in of the sequence,  and the count/frequency of it.  By dealing with a complete list of data, you can see whether the pattern is unique or has repeated over time.

 

The principle is fairly simple.  Provide the list, determine the sequential difference of your choice, split the input and then summarize.  For numbers, you can use numpy's diff function as shown below.

a = [1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 5, 4, 4, 3, 3, 3, 2, 1]

seqs = np.split(a, np.where(np.diff(a) != stepsize)[0] + 1)

seqs

[array([1, 1]),
array([2, 2]),
array([3, 3, 3]),
array([4, 4]),
array([5, 5, 5, 5]),
array([4, 4]),
array([3, 3, 3]),
array([2]),
array([1])]

 

For text data, the process is the same, but you compare each sequential element directly

a = np.array(['B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A', 'A', 'B', 'A', 'B', 'B'], dtype='<U5')

seqs = np.split(a, np.where(a[1:] != a[:-1])[0] + 1)

seqs

[array(['B', 'B', 'B', 'B'], dtype='<U5'),
array(['A'], dtype='<U5'),
array(['B', 'B'], dtype='<U5'),
array(['A', 'A', 'A', 'A'], dtype='<U5'),
array(['B'], dtype='<U5'),
array(['A'], dtype='<U5'),
array(['B', 'B'], dtype='<U5')]

 

From that point on it is simply a matter of summarizing the result.  That is the purpose of the rest of the tool.

 

The dialog is fairly simple.  Specify an input table and the field to query and a 'step size' which is the spacing between the values that you want to use to identify the sequence.  The simplest case is to identify observations that are identical... that is, there sequential difference is zero.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

An optional output table can be created to permit further analysis.

 

The input and output table for the Sequences field are shown to the right.  The first sequence is four values of 2 beginning at ID number 0 and extending up to, but not including ID number 4.

 

Subsequent lines in the output table represent the different sequences.  In this case, a sequence of 1 (value = 1) is followed by a sequence of 2 (value = 2) and another sequence of 4 (value = 1).

 

Some answers to questions.

 

 

 

 

1 What is the longest sequence of value = 1?

2  What is the median?

 

A simple Select By Attributes followed by a Statistics on the Count field and you are done.

 

 

 

 

 

 

 

 

 

 

 

Alternately, you can Summarize on the Count field based on the Value field.

 

 

So examining sequences and patterns and their patterns and/or sequences come up a lot in analysis.  Hope these tools will get you thinking.

 

NOTE.. the url to the toolset will appear

here....

when ArcGIS PRO Beta 2.2 is complete.

Tools will also be available to analysis sequences (or duplicates) for text data.

 

That's all for now.

Outcomes