NumPy and Arcpy play nice: part 2

DanPatterson · ‎11-28-2021

Why not use <null>?

The solution to <null> in tables - Esri Community

When working with arcpy and NumPy there are imports that are useful. Here are some. You don't need them all for what is posted here. Homework is to figure out what ones you need.

# -- Basic imports
#
import numpy as np
from arcpy import Array, Exists, Multipoint, Point, Polygon, Polyline

from arcpy.da import (
    Describe, InsertCursor, SearchCursor, FeatureClassToNumPyArray,
    TableToNumPyArray)  # ExtendTable, NumPyArrayToTable,  UpdateCursor

from arcpy.geoprocessing import env
from arcpy.management import (
    AddField, CopyFeatures, CreateFeatureclass, Delete)  # DeleteFeatures

Change useless blanks (aka <null>) in geodatabase tables to useful, and generally safe nulls.

This is called by several scripts. Read the header, there are options beyond the old Fortran variants on -999. My favorite is -2147483648 since it is guaranteed to foul up any calculation or count that you intend to make.

# -- see Imports above and pick what you need

# Change FC <null> to a useful nodata value
def make_nulls(in_fc, include_oid=True, int_null=-999):
    """Return null values for a list of fields objects.

    This excludes objectid and geometry related fields.
    Throw in whatever else you want.

    Parameters
    ----------
    in_fc : featureclass or featureclass table
        Uses arcpy.ListFields to get a list of featureclass/table fields.
    include_oid : boolean
        Include the `object id` field to denote unique records and geometry
        in featureclasses or geodatabase tables.  This is recommended, if you
        wish to join attributes back to geometry.
    int_null : integer
        A default to use for integer nulls since there is no ``nan`` equivalent
        Other options include

    >>> np.iinfo(np.int32).min # -2147483648
    >>> np.iinfo(np.int16).min # -32768
    >>> np.iinfo(np.int8).min  # -128

    >>> [i for i in cur.__iter__()]
    >>> [[j if j else -999 for j in i] for i in cur.__iter__() ]

    Notes
    -----
    The output objectid and geometry fields are renamed to
    `OID_`, `X_cent`, `Y_cent`, where the latter two are the centroid values.
    """
    nulls = {'Double': np.nan, 'Single': np.nan, 'Float': np.nan,
             'Short': int_null, 'SmallInteger': int_null, 'Long': int_null,
             'Integer': int_null, 'String': str(None), 'Text': str(None),
             'Date': np.datetime64('NaT'), 'Geometry': np.nan}
    #
    from arcpy.da import Describe
    desc = Describe(in_fc)
    if desc['dataType'] not in ('FeatureClass', 'Table'):
        print("Only Featureclasses and tables are supported")
        return None, None
    in_flds = desc['fields']
    good = [f for f in in_flds if f.editable and f.type != 'Geometry']
    # good = [f for f in in_flds if f.type != 'Geometry']
    fld_dict = {f.name: f.type for f in good}
    fld_names = list(fld_dict.keys())
    null_dict = {f: nulls[fld_dict[f]] for f in fld_names}
    # -- insert the OBJECTID field
    if include_oid and desc['hasOID']:
        oid_name = desc['OIDFieldName']
        oi = {oid_name: -999}
        null_dict = dict(list(oi.items()) + list(null_dict.items()))
        fld_names.insert(0, oid_name)
    return null_dict, fld_names

Like this one that cleans up your table's data and makes an array out of it.

# Featureclass table attribute data
def tbl_data(in_tbl, int_null=-999):
    """Pull all editable attributes from a featureclass tables.

    During the process, <null> values are changed to an appropriate type.

    Parameters
    ----------
    in_tbl : text
        Path to the input featureclass.

    Notes
    -----
    The output objectid and geometry fields are renamed to
    `OID_`, `X_cent`, `Y_cent`, where the latter two are the centroid values.
    """
    flds = ['OID@']
    null_dict, fld_names = make_nulls(in_tbl, include_oid=True,
                                      int_null=int_null)
    if flds not in fld_names:
        new_names = out_flds = fld_names
    if fld_names[0] == 'OID@':
        out_flds = flds + fld_names[1:]
        new_names = ['OID_', 'X_cent', 'Y_cent'] + out_flds[3:]
    a = TableToNumPyArray(
        in_tbl, out_flds, skip_nulls=False, null_value=null_dict
    )
    a.dtype.names = new_names
    return np.asarray(a)

The last line produces your array and you are off to analyze your data. Tables and featureclass tables return structured arrays (arrays with named fields). You can extract the columns/fields you want and produce new tables or join the results back.

I should point out that a lot of people use Pandas, but xarray and other modules also work with arrays. Everything of use supports the array protocol and many are directly compatable.

Consortium for Python Data API Standards (github.com)

https://github.com/data-apis/array-api

More to come