Fun with structured arrays

DanPatterson · ‎02-15-2023

A quick import of numpy's recfunctions module to see what is available in the structured/record array arsenal.

import numpy.lib.recfunctions as rfn

dir(rfn)
['MaskedArray', 'MaskedRecords',...,
 'append_fields', 'apply_along_fields', 'assign_fields_by_name', 
 'drop_fields', 'find_duplicates', 'flatten_descr', 'get_fieldstructure',
 'get_names',  'get_names_flat', ..., 'join_by', ... 'merge_arrays',...
 'rec_append_fields', 'rec_drop_fields', 'rec_join', 'recursive_fill_fields',
 'rename_fields', 'repack_fields', 'require_fields', 'stack_arrays',
 'structured_to_unstructured', 'unstructured_to_structured']

How about some different joins?

# ---- table `a`, with an integer field 'f0' and a text field 'f1'

a = np.array([(0, 'A'), (1, 'B'), (2, 'C'), (3, 'D'), (4, 'E'),
              (5, 'F'), (6, 'G'), (7, 'H'), (8, 'I'), (9, 'J')],
      dtype=[('f0', '<i4'), ('f1', 'U2')])

# ---- table `b`, has a common 'key' field, 'f0' and another text field 'f1'
 
b = np.array([(0, 'a'), (1, 'b'), (8, 'c'), (9, 'd')],
      dtype=[('f0', '<i4'), ('f1', 'U2')])

"""  ---- How about some joins?
'inner', returns the elements common to both `a` and `b`. 
'outer', returns the common elements as well as the elements of `a` not in
         `b` and the elements of not in `b`.
'leftouter', returns the common elements and the elements of `a` not in `b`.
'N/'  is nodata
"""
inner_ = rfn.join_by('f0', a, b, 'inner', usemask=False)
outer_ = rfn.join_by('f0', a, b, 'outer', usemask=False)
lft_outer_ = rfn.join_by('f0', a, b, 'leftouter', usemask=False)

inner_
array([(0, 'A', 'a'), (1, 'B', 'b'), (8, 'I', 'c'), (9, 'J', 'd')],
      dtype=[('f0', '<i4'), ('f11', '<U2'), ('f12', '<U2')])

outer_
array([(0, 'A', 'a'), (1, 'B', 'b'), (2, 'C', 'N/'), (3, 'D', 'N/'),
       (4, 'E', 'N/'), (5, 'F', 'N/'), (6, 'G', 'N/'), (7, 'H', 'N/'),
       (8, 'I', 'c'), (9, 'J', 'd')],
      dtype=[('f0', '<i4'), ('f11', '<U2'), ('f12', '<U2')])

lft_outer_
array([(0, 'A', 'a'), (1, 'B', 'b'), (2, 'C', 'N/'), (3, 'D', 'N/'),
       (4, 'E', 'N/'), (5, 'F', 'N/'), (6, 'G', 'N/'), (7, 'H', 'N/'),
       (8, 'I', 'c'), (9, 'J', 'd')],
      dtype=[('f0', '<i4'), ('f11', '<U2'), ('f12', '<U2')])

Quickly add some fields and data ...

c = np.copy(a)  # -- keep the original

c = rfn.append_fields(c,
        ['SomeFloats', 'MoreInts'], # -- the field names
        [np.arange(0, 10)*10.,
          np.arange(0, 20, 2)],     # -- data for each field
        usemask=False)              # -- mask for nodata if needed

# -- the field type is inferred from the input data or can be specified
c
array([(0, 'A',  0.,  0), (1, 'B', 10.,  2), (2, 'C', 20.,  4),
       (3, 'D', 30.,  6), (4, 'E', 40.,  8), (5, 'F', 50., 10),
       (6, 'G', 60., 12), (7, 'H', 70., 14), (8, 'I', 80., 16),
       (9, 'J', 90., 18)],
      dtype=[('f0', '<i4'), ('f1', '<U2'),
             ('SomeFloats', '<f8'), ('MoreInts', '<i4')])

How about combining arrays even with missing data. I will throw in a recalculation at the same time.

# -- a quick preview of appending array `c` to array `a`
#    you can do `field mapping` if needed, but we don't

rfn.stack_arrays((a, c), usemask=False)
array([(0, 'A', 1.e+20, 999999), (1, 'B', 1.e+20, 999999),
       (2, 'C', 1.e+20, 999999), (3, 'D', 1.e+20, 999999),
       ... snip
       (8, 'I', 8.e+01,     16), (9, 'J', 9.e+01,     18)],
      dtype=[('f0', '<i4'), ('f1', '<U2'),
             ('SomeFloats', '<f8'), ('MoreInts', '<i4')])

# -- now lets do the combining and recalculate the index field `f0`
d = rfn.stack_arrays((a, c), usemask=False)
d['f0'] = np.arange(0, d.shape[0])
d
array([( 0, 'A', 1.e+20, 999999), ( 1, 'B', 1.e+20, 999999),
       ( 2, 'C', 1.e+20, 999999), ( 3, 'D', 1.e+20, 999999),
       ( 4, 'E', 1.e+20, 999999), ( 5, 'F', 1.e+20, 999999),
       ( 6, 'G', 1.e+20, 999999), ( 7, 'H', 1.e+20, 999999),
       ( 8, 'I', 1.e+20, 999999), ( 9, 'J', 1.e+20, 999999),
       (10, 'A', 0.e+00,      0), (11, 'B', 1.e+01,      2),
       (12, 'C', 2.e+01,      4), (13, 'D', 3.e+01,      6),
       (14, 'E', 4.e+01,      8), (15, 'F', 5.e+01,     10),
       (16, 'G', 6.e+01,     12), (17, 'H', 7.e+01,     14),
       (18, 'I', 8.e+01,     16), (19, 'J', 9.e+01,     18)],
      dtype=[('f0', '<i4'), ('f1', '<U2'),
             ('SomeFloats', '<f8'), ('MoreInts', '<i4')])

Now, rename some fields (using field mapping) and merge two arrays.

# -- rename array `a` fields and assign to `e`
e = rfn.rename_fields(a, {'f0':'f0_new', 'f1':'f1_new'})

# -- merge `e` and `c`
rfn.merge_arrays((e, c), flatten=True, usemask=False)
array([(0, 'A', 0, 'A',  0.,  0), (1, 'B', 1, 'B', 10.,  2),
       (2, 'C', 2, 'C', 20.,  4), (3, 'D', 3, 'D', 30.,  6),
       (4, 'E', 4, 'E', 40.,  8), (5, 'F', 5, 'F', 50., 10),
       (6, 'G', 6, 'G', 60., 12), (7, 'H', 7, 'H', 70., 14),
       (8, 'I', 8, 'I', 80., 16), (9, 'J', 9, 'J', 90., 18)],
      dtype=[('f0_new', '<i4'), ('f1_new', '<U2'),
             ('f0', '<i4'), ('f1', '<U2'),
             ('SomeFloats', '<f8'), ('MoreInts', '<i4')])

The possibilities go on. You can bring back your work to an existing table using

arcpy.da.ExtendTable ExtendTable—ArcGIS Pro | Documentation

or

arcpy.da.NumPyArrayToTable NumPyArrayToTable—ArcGIS Pro | Documentation

Enough for now.