Select to view content in your preferred language

Finding irregular patterns in data : numpy snippets

559
0
10-16-2024 03:37 PM
Labels (1)
DanPatterson
MVP Esteemed Contributor
4 0 559

From my previous incarnation...

Patterns, sequences, occurrence and position - Esri Community

In that document, most of the information being sought was sequential in nature.  But what if the pattern being sought isn't.  

To the example, I had the need to identify  a particular pattern in a 1d data set representing intersections on the boundary of a polygon.  The 1 values represented line segments entering the polygon and -1 values were those segments exiting the polygon with 0 being intersection points that were on the boundary.  Of course, some of the segments followed the polygon boundary so there was no set entry/exit sequence in the data.

# -- sample data... a really, really small sample
a = np.array([-1,  0, -1,  0, -1,  1,  0,  0, -1,  0,  0,  0])
#
# -- my sequence of interest, (amongst others)
b = [0, -1, 0]

So the problem is simple-ish, find 0, -1, 0 as "it" runs along the sequence (or down the column).

This brought to mind sliding windows view of the data.  Sliding windows are akin to what is used to calculate running means or other running calculations.  Of course, numpy has a particularly sleek of implementing sliding views of data in 1d, 2d, 3d and beyond.  Since it is a view of the data it is memory efficient.

The array a viewed as a running sequence of 3 is shown below.

# -- the import
from numpy.lib.stride_tricks import sliding_window_view as swv
# -- the array/data set
a = np.array([-1,  0, -1,  0, -1,  1,  0,  0, -1,  0,  0,  0])
# -- the 3 point sequence of sliding values
b = [0, -1, 0]
n = len(b)
idx = np.nonzero((swv(a, (n,)) == b).all(-1))[0]
# -- the sliding window sequence
swv(a, (n,))
array([[-1,  0, -1],
       [ 0, -1,  0],  # first found
       [-1,  0, -1],
       [ 0, -1,  1],
       [-1,  1,  0],
       [ 1,  0,  0],
       [ 0,  0, -1],
       [ 0, -1,  0],  # the second
       [-1,  0,  0],
       [ 0,  0,  0]])
# -- the index result
idx
array([1, 7], dtype=int64)
# their positional values in the data
seqs
array([[1, 2, 3],
       [7, 8, 9]], dtype=int64)

How about segments that exit the boundary at a common point.

b = [-1,  0, -1]
n = len(b)
idx = np.nonzero((swv(a, (n,)) == b).all(-1))[0]
seqs = np.asarray([np.arange(i, i + n) for i in idx]).reshape(-1, n)
# results
idx
array([0, 2], dtype=int64)

seqs
array([[0, 1, 2],
       [2, 3, 4]], dtype=int64)

There are lots of other examples and it even works with text data.

a = ['a', 'b', 'c', 'b', 'c', 'a', 'a', 'c']
b = ['b', 'c', 'b']
a = np.asarray(a)
b = np.asarray(b)
n = len(b)
idx = np.nonzero((swv(a, (n,)) == b).all(-1))[0]

idx
array([1], dtype=int64)

swv(a, (n,))
array([['a', 'b', 'c'],
       ['b', 'c', 'b'],
       ['c', 'b', 'c'],
       ['b', 'c', 'a'],
       ['c', 'a', 'a'],
       ['a', 'a', 'c']], dtype='<U1')

By the way, columns of data can be converted to numpy arrays using arcpy's TableToNumPyArray (see my link above.

That's all for now from numpy snippets.

 

Tags (3)
Contributors
About the Author
Retired Geomatics Instructor (also DanPatterson_Retired). Currently working on geometry projects (various) as they relate to GIS and spatial analysis. I use NumPy, python and kin and interface with ArcGIS Pro.
Labels