Skip navigation
All People > Dan_Patterson > Py... blog
1 2 3 4 Previous Next

Py... blog

121 posts

Part 5 ... The attributes attached to the geometry

 

---- As before, the inputs ----

The polygons that I will be using are shown to the right.

  1. A square, 5 points, first and last duplicates
  2. A lake on an island in a lake...
  3. A multipart with a two different shaped donut holes
  4. The letter 'C'
  5. The letter 'V'

Each part is labelled at the labelpoint rather than the centroid, hence each part gets labelled.

 

 

 

 

 

---- An Alternate Geometry Reconstructor ----
---- Arrays to Poly* Features ----

 

Never use <null> in a table.  To many posts on the forum on how to trap them, find them, replace them.

Always put in a value to represent all conditions.  Too many people use None <null> as the catchall category.  In reality all observations need to be classified exactly, there really is no such thing as <null>.  You either made and observation or you didn't.  If you didn't, your classification scheme should provide a key indicating that.

 

If an observation was made but the phenomenon/parameter/whatever was actually not there, there should be a key for that.  Similar, for 'I forgot', 'Wasn't my job' or whatever other excuses exist.  

 

  • For observations recorded as floating point numbers, that truly yielded 'nothing/None/nadda/zilch', I use 'nan' (np.nan) since it is a recognized number.
  • For text/string observations, I use None since None the object translates to 'None' the text easily in most tables.
  • For time, there is now 'not a time' (NaT), but I don't work with time, preferring to use the string incarnations of those observations
  • For integers... sadly there is no 'Nint'.  You have to provide an actual integer value to represent no value observed, although you desperately tried.   You can use the old school -999, or even 2**8, 2**16 etcetera.
  • For all of the above, anything that is truly doesn't represent an observation where no value was observed, you will have to provide alternatives

 

Making none/null/real nothingness

 

You can add to, or remove from, the list below.  These are some that I use.  I will draw your attention to the NumPy incarnations for integers.  Equivalent values exist for floats.  Any value that ensures that you will take a second look if a calculation looks weird is good.  However if your table contains <nulls> even after my lecture above, this will help mitigate your... stupidity is such a harsh word... but you get my drift

def _make_nulls_(in_fc, int_null=-999):
    """Return null values for a list of fields objects, excluding objectid
    and geometry related fields.  Throw in whatever else you want.

    Parameters
    ----------
    in_flds : list of field objects
        arcpy field objects, use arcpy.ListFields to get a list of featureclass
        fields.
    int_null : integer
        A default to use for integer nulls since there is no ``nan`` equivalent
        Other options include

    >>> np.iinfo(np.int32).min # -2147483648
    >>> np.iinfo(np.int16).min # -32768
    >>> np.iinfo(np.int8).min  # -128

    [i for i in cur.__iter__()]
    [[j if j is not None else -999 for j in i ] for i in cur.__iter__() ]
    """
    nulls = {'Double': np.nan, 'Single': np.nan, 'Float': np.nan,
             'Short': int_null, 'SmallInteger': int_null, 'Long': int_null,
             'Integer': int_null, 'String':str(None), 'Text':str(None),
             'Date': np.datetime64('NaT')}
    #
    desc = arcpy.da.Describe(in_fc)
    if desc['dataType'] != 'FeatureClass':
        print("Only Featureclasses are supported")
        return None, None
    in_flds = desc['fields']
    shp = desc['shapeFieldName']
    good = [f for f in in_flds if f.editable and f.name != shp]
    fld_dict = {f.name: f.type for f in good}
    fld_names = list(fld_dict.keys())
    null_dict = {f: nulls[fld_dict[f]] for f in fld_names}
    # ---- insert the OBJECTID field
    return null_dict, fld_names

My favorite way of getting just the attributes

 

Such nice functions, FeatureClassToNumPyArray, TableToNumPyArray, and back the other way. 

I am sure many of you have explored where it all comes from only to find it all buried in a *.pyd file

import arcpy.da as apd
apd.__file__ # ---- 'C:\\...install path...\\Resources\\ArcPy\\arcpy\\da.py'

# ---- which is actually just imports arcgisscripting
# ---- so import it directly

import arcgisscripting as ags

ags.__file__
# ---- 'C:\\...install path...\\bin\\Python\\envs\\arcgispro-py3\\lib\\
# site-packages\\arcgisscripting.pyd'

 

No bother, since you can pull out data for the attributes nicely accounting for <null> records.

def fc_data(in_fc, verbose=False):
    """Pull all editable attributes from a featureclass tables.  During the
    process, <null> values are changed to an appropriate type.

    Parameters
    ----------
    in_fc : text
        Path to the input featureclass
    verbose : boolean
        Requires ``'prn_rec' in locals().keys()`` in order to set to ``True``.
        ``prn_rec`` is imported from arraytools.frmts
    """

    flds = ['OID@', 'SHAPE@X', 'SHAPE@Y']
    null_dict, fld_names = _make_nulls_(in_fc, int_null=-999)
    fld_names = flds + fld_names
    new_names = ['OID_orig', 'X_centroid', 'Y_centroid']
    a = arcpy.da.FeatureClassToNumPyArray(in_fc, fld_names, skip_nulls=False,
                                          null_value=null_dict)
    a.dtype.names = new_names + fld_names[3:]
    if verbose:
        try: prn_rec(a)  # ** prn_rec imported from arraytools.frmts
        except: print(a)
    return np.asarray(a)

The explorers amongst us, may have discovered a few searchcursor shortcuts

fld_names = ['OBJECTID', 'Long_1', 'Short_1', 'Float_1', 'Double_1', 'Text_1']

cur = arcpy.da.SearchCursor(in_fc, fld_names, explode_to_points=False)

z = cur._as_narray()

Traceback (most recent call last):
File "<ipython-input-220-b4e724f0982b>", line 1, in <module>
z = cur._as_narray()

Sadly, the integer fields with <nulls> bring the whole shortcut down.

The searchcursor actually has enough information in it to create a structured/recarray. 

If you have a clean table with no nulls, the actual calls to _dtype and fields show that you can clearly link cursors and NumPy arrays.  Too bad, the whole integer fix isn't incorporated, but _as_narray and FeatureClassToNumPyArray and TableToNumPyArray yield the same results on a 'clean' dataset.

dir(cur)

[...snip... '_as_narray', '_dtype', 'fields', 'next', 'reset']

 

--------------------------------------------------------------------------

Coming soon

 

  • Work with the geometry... and/or … work with the attributes, then put it all back together.

 

Geometry in NumPy... # 1 

Geometry ... ArcPy and NumPy... # 2 

Geometry ... Deconstructing poly* features  # 3 

Geometry ... Reconstructing Poly Features # 4 

Part 3... Alternate geometry deconstructors

---- On to the examples ----

The polygons that I will be using are shown to the right.

  1. A square, 5 points, first and last duplicates
  2. Donut with a Timbit inside
  3. A multipart with a donut hole in each
  4. The letter 'C'
  5. The letter 'V'

Each part is labelled at the labelpoint rather than the centroid, hence each part gets labelled.

 

 

 

I will be omitting examples that rely on json representation or the __geo_interface__ method since they don't add much to the functionality of constructing and deconstructing poly* type features.

 

---- An Alternate Geometry Deconstructor ----

---- Poly* Features to Points ----

This one may seem a bit odd, so I will explain it abit.Returns

  •  Line 13, gets the spatial reference for the featureclass
  • A null point is created on line 16... none of this None stuff or a 
    • not_null_point = arcpy.Point()
      not_null_point.X   #  0.0
      not_null_point.Y   #  0.0
  • Line 17, your familiar SearchCursor is used to pull the geometry object out for further work. (I use an enumerator since I use 'i' in a different incarnation of the script...more in a different blog)
  • Line 21 pulls our arcpy Point coordinates unless 'None' is found.  If it is, then it is replaced with my null_pnt.
  • Line 24 records the number of points in the shape.  This value is used later for poly* reconstruction and is returned is 'as_obj' is False.
  • Line 26 produces a numpy array from the coordinates and null_pnts.
  • Line 27 does a cumulative sum of the points in each shape.  This is a key... 'c' allows us to split a 2D array of points into subgroups.  A subgroup may contain a null_pnt which further allows one to split the points into parts whether they be interior rings and/or array parts.
  • And the final lines.  If 'as_obj' is True a list of arrays is returned.  The arrays may be conventional 2D arrays, 3D arrays or object arrays.  Not for you to worry though.  If 'as_obj' is false, then a 2D list of points is returned as well as the 'c' array should you need to split it later.

Why the difference

  • With the 2D array of points, it is blooming easy to determine the parameters for the whole array dataset multiple times without having to repeat things.  For example, means, std deviations.
  • The 2D array allows you to do things that you can't do easily.  Every tried to rotate polygons about an origin point?  How about a tiny shift in the coordinates.  A quick densification but you only have a Standard license?  Think of something that you can't do easily with the plain arcpy geometry objects... they are easy in array format.
  • With the object array option, you can pick and choose which geometry objects to operate on and you don't need to reconstruct them prior to sending them back to arcpy geometries.

====================================================================================

The Script

 

def poly_pnts(in_fc, as_obj=False):
    """Poly features to points while maintaining the location of the points in
    the input features. null points are replaces with their numpy equivalent.

    Parameters
    ----------
    in_fc : text
        Full path to the featureclass.
    as_obj : boolean
        True returns a list of arrays of the shapes.  The array types may be
        mixed.  False, returns a 2D array of points and an array of indices.
    """

    SR = getSR(in_fc)   
    out = []
    c = []
    null_pnt = (np.nan, np.nan)  #  inf_pnt = [np.PINF, np.PINF]
    with arcpy.da.SearchCursor(in_fc, 'SHAPE@', spatial_reference=SR) as cur:
        for i, row in enumerate(cur):
            sub = []
            poly = row[0]
            pnts = [[pt.X, pt.Y] if pt else null_pnt
                     for arr in poly for pt in arr]
            sub.extend(pnts)  # removed from next line  sub.append(inf_pnt)
            c.append(len(pnts))
            out.extend(sub)
    out = np.asarray(out)
    c = np.cumsum(c)
    if as_obj:
        return np.split(out, c)[:-1]  # cut off the last empty slice
    return out, c

---- Results ....

a, c = poly_pnts(in_fc, as_obj=False)

a.shape  # (58, 2) ... 58 points of X,Y coordinates

a
array([[ 300010., 5000000.],
       [ 300000., 5000000.],
       [ 300000., 5000010.],
       [ 300010., 5000010.],
       [ 300010., 5000000.],  # ---- 5th point, last of the first polygon
       [ 300010., 5000020.],
       [ 300010., 5000010.],
       [ 300000., 5000010.],
       [ 300000., 5000020.],
       [ 300010., 5000020.],
       [     nan,      nan],  # ---- end of the first part of the second shape
    .... big snip

c
array([ 5, 21, 41, 50, 58], dtype=int32)  # note '5'

 

--------------------------------------------------------------------------

Coming soon

The next blog post will go over reconstructing the geometries after processing their array representation.

 

Geometry in NumPy... # 1 

Geometry ... ArcPy and NumPy... # 2 

Part 2... Deconstructing Geometry

---- On to the examples ----

The polygons that I will be using are shown to the right.

  1. A square, 5 points, first and last duplicates
  2. Donut with a Timbit inside
  3. A multipart with a donut hole in each
  4. The letter 'C'
  5. The letter 'V'

Each part is labelled at the labelpoint rather than the centroid, hence each part gets labelled.

 

 

 

I will be omitting examples that rely on json representation or the __geo_interface__ method since they don't add much to the functionality of constructing and deconstructing poly* type features.

 

---- Getting Geometry from FeatureClasses ----

---- The SearchCursor Approach ----

The functions getSR and _view_ are described at the end.  They are helper functions used to derive the spatial reference and to reshape the coordinates.  The key operative in this approach is on line 8, _as_narray, which does the conversion behind the scenes.

def cur_xy(in_fc, to_pnts=True):
    """Convert featureclass geometry (in_fc) to a simple 2D structured array
    with ID, X, Y values. Optionally convert to points, otherwise centroid.
    """

    SR = getSR(in_fc)
    flds = ['SHAPE@X', 'SHAPE@Y']
    cur = arcpy.da.SearchCursor(in_fc, flds, spatial_reference=SR,
                                explode_to_points=to_pnts)
    a = cur._as_narray()
    a = _view_(a)
    return a

---- Results ....

Essentially you have a simplified list of X, Y coordinates since the 'shp' was defined as 'SHAPE@X' and 'SHAPE@Y' with 'explode_to_points' set to True (False, returns centroids). Sadly you can't reconstruct the polygons unless you deal with the which points belong to what polygon.

array([[ 300020., 5000000.],
       [ 300010., 5000000.],
       [ 300010., 5000010.],
       ...,
       [ 300002., 5000002.],
       [ 300008., 5000002.],
       [ 300005., 5000008.]])

---- The FeatureClassToNumPyArray Approach ----

This produces the same results above, requires the same sort of inputs.  Timing shows that they are strongly related, especially since _as_narray has a fields and dtype property.

def fc_xy(in_fc):
    """Return the x,y coordinates for points in a featureclass (in_fc) using a
    data access searchcursor.
    """

    SR = getSR(in_fc)
    a = arcpy.da.FeatureClassToNumPyArray(in_fc, ['SHAPE@X', 'SHAPE@Y'],
                                          spatial_reference=SR,
                                          explode_to_points=True)
    a = _view_(a)
    return a

---- Retrieving shape objects ----

If you want to use arcpy directly because of the builtin methods, you need the contents of the 'shape' fields using SHAPE@' rather than just extracting the X and Y coordinates as in the previous example

def fc_shapes(in_fc, as_array=True):
    """Derive, arcpy geometry objects from a featureClass searchcursor.

    Parameters
    ----------
    in_fc : text
        Path to the input featureclass
    as_array: boolean
        True, return an object array of arcpy polygon objects.  False, returns
        a list.
    """

    SR = getSR(in_fc)
    with arcpy.da.SearchCursor(in_fc, 'SHAPE@', None, SR) as cursor:
        a = [row[0] for row in cursor]
    if as_array:
        return np.asarray(a)
    return a

---- Put it to work ----

polys = fc_shapes(in_fc, as_array=True)

polys
array([<Polygon object at 0x197f1bcebe0[0x197f0199968]>,
       <Polygon object at 0x197f1bcec18[0x197e9b48da0]>,
       <Polygon object at 0x197f1bceb70[0x197f005b738]>,
       <Polygon object at 0x197f1bceb38[0x197f005b5a8]>,
       <Polygon object at 0x197f1bceac8[0x197f005b760]>], dtype=object)

---- Geometry as a structured array ----

Nothing fancy, but there is an integer ID field indicating which feature a point belongs to and the coordinates.

A simpler version of the above... just an ID field and coordinates.  

def fc_xyID(in_fc, to_pnts=True):
    """Convert featureclass geometry (in_fc) to a simple 2D structured array
    with ID, X, Y values. Optionally convert to points, otherwise centroid.
    """

    SR = getSR(in_fc)
    flds = ['OID@', 'SHAPE@X', 'SHAPE@Y']
    cur = arcpy.da.SearchCursor(in_fc, flds, spatial_reference=SR,
                                explode_to_points=to_pnts)
    a = cur._as_narray()
    a.dtype = [('IDs', '<i4'), ('X_s', '<f8'), ('Y_s', '<f8')]
    return a

---- The results ----

a = fc_xyID(in_fc, to_pnts=True)

a
array([(1, 300010., 5000000.), (1, 300000., 5000000.), (1, 300000., 5000010.),
       (1, 300010., 5000010.),(1, 300010., 5000000.),
... snip
       (5, 300020., 5000010.),(5, 300022., 5000010.), (5, 300025., 5000002.),
       (5, 300028., 5000010.), (5, 300030., 5000010.),(5, 300026., 5000000.),
       (5, 300024., 5000000.), (5, 300020., 5000010.)],
      dtype=[('IDs', '<i4'), ('X_s', '<f8'), ('Y_s', '<f8')])

The polygon ID that each point belongs to is retained, however, it is replicated many times and the null points separating polygon parts is removed.

 

--------------------------------------------------------------------------

Helper functions

 

Helper Functions 
_view_   .... view structured arrays as an unstructured array

_getSR  .... spatial reference for a featureclass

def _view_(a):
    """Return a view of the array using the dtype and length
    Notes
    -----
    The is a quick function.  The expectation is that the array contains a
    uniform dtype (e.g 'f8').  For example, coordinate values in the form
    ``dtype([('X', '<f8'), ('Y', '<f8')])`` maybe with a Z.
    References
    ----------
    ``structured_to_unstructured`` in np.lib.recfunctions and its imports.
    `<https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py>`_.
    """

    v =  np.version.version.split('.')[1]  # version check
    if int(v) >= 16:
        from numpy.lib.recfunctions import structured_to_unstructured as stu
        return stu(a)
    else:
        names = a.dtype.names
        z = np.zeros((a.shape[0], 2), dtype=np.float)
        z[:,0] = a[names[0]]
        z[:,1] = a[names[1]]
        return z

Result...

a = np.array([(300015.  , 5000005.  ),
              (300005.  , 5000015.  ),
              (300010.49, 5000010.59)],
              dtype=[('SHAPE@X', '<f8'), ('SHAPE@Y', '<f8')])

_view_(a)

array([[ 300015.  , 5000005.  ],
       [ 300005.  , 5000015.  ],
       [ 300010.49, 5000010.59]])

getSR ....

def getSR(in_fc):
    """Return the spatial reference of a featureclass"""
    desc = arcpy.da.Describe(in_fc)
    SR = desc['spatialReference']
    return SR

 

Which you can use for basic spatial reference facts ( dir(SR) for a full list )

SR.type, SR.name, SR.factoryCode, SR.centralMeridian, SR.falseEasting

('Projected', 'NAD_1983_CSRS_MTM_9', 2951, -76.5, 304800.0)

 

Geometry in NumPy... # 1 

Arrays

The case against <null> 

 

The scenario...

Some data.  Simple, but a combination of singlepart, multipart a couple of holes and tabular data with nulls.

Time for the SearchCursor

Easy couple of steps, when you know what you want.  We will just blow up the polygons to points and get all the attributes in the table.

in_fc = r"C:\My_spaceless_path_to_my\file_geodatabase.gdb\Polygons"
flds = arcpy.ListFields(in_fc, "*")
desc = arcpy.da.Describe(in_fc)          # create the describe object
SR = desc['spatialReference']            # spatial reference always needed
flds = [i.name for i in desc['fields']]  # field names are good

# time to make some points from the polygons
cur = arcpy.da.SearchCursor(in_fc, flds, spatial_reference=SR,
                            explode_to_points=True)

Any questions?

Pretty straightforward.

 

But cursors have some interesting properties.

cur.fields
  ('OBJECTID', 'Shape', 'Id', 'Long_1', 'Short_1', 'Float_1', 'Double_1', 'Text_1',
   'Shape_Length', 'Shape_Area', 'Date_time', 'DT_str')

cur._dtype

dtype([('OBJECTID', '<i4'), ('Shape', '<f8', (2,)), ('Id', '<i4'), ('Long_1', '<i4'),
       ('Short_1', '<i4'), ('Float_1', '<f4'), ('Double_1', '<f8'), ('Text_1', '<U10'),
       ('Shape_Length', '<f8'), ('Shape_Area', '<f8'), ('Date_time', '<M8[us]'), 'DT_str', '<U20')])

Wow!  Almost looks numpy array-like.  There is even an _as_narray ... I wonder.

a = cur._as_narray()  # ---- give it whirl and see what happens

Traceback (most recent call last):
  File "<ipython-input-174-371bed78ca5c>", line 1, in <module>

    a = cur._as_narray()

TypeError: int() argument must be a string, a bytes-like object or a number,
                 not 'NoneType'

# ---- dismal failure... translation?

 

Well that didn't go well!  Translation? looks like an attempt was made to convert some bit of data to an integer and it couldn't be because of those pesky <nulls> ...aka... None in the table!

Yes those things that you have to exclude from calculations, you have to query for, you have to remember it isn't good to see if something is equal to None but whether something IS None.

Hence, Nulls are evil.  They are a lazy way out of not recording a value in a table that represents the fact that you have no observation for that field.  

 

You have to get rid of nulls, so if you are too lazy to do it at the beginning during data preparation and Q/A work, then there is a way out.  So if you ignore the verbosity in the code below, there are a few tips.

  • For a particular data type, pick an appropriate null
    • for floats/doubles, python and numpy have NaN  (not a number, which is a double not a number... with me?)
    • text/string, easy ... choose an appropriate null value, anything that is in text form. Some suggestions:
      • "NoneType", "None_the_string", "Nadda", "Missed_that_one", "not_mine" ... 
    • Integers.... sadly there is no Nint (Not an integer), so you have to get creative, like old school, -9, -99, -999 etc or chose specify a minimum or maximum based on the integer dtype (see line 14-16 for examples)
    • Time... NaT... Not a Time... got one, use it instead of <null>

 

def _make_nulls_(in_flds, int_null=-999):
    """Return null values for a list of fields objects, excluding objectid
    and geometry related fields.  Throw in whatever else you want.

    Parameters
    ----------
    in_flds : list of field objects
        arcpy field objects, use arcpy.ListFields to get a list of featureclass
        fields.
    int_null : integer
        A default to use for integer nulls since there is no ``nan`` equivalent
        Other options include

    >>> np.iinfo(np.int32).min # -2147483648
    >>> np.iinfo(np.int16).min # -32768
    >>> np.iinfo(np.int8).min  # -128
    """
    if not isinstance(in_flds, (list, tuple)):
        in_flds = [in_flds]
    nulls = {'Double': np.nan, 'Single': np.nan, 'Float': np.nan,
             'Short': int_null, 'SmallInteger': int_null, 'Long': int_null,
             'Integer': int_null, 'String':str(None), 'Text':str(None),
             'Date': np.datetime64('NaT')}
    bad =  ['Guid', 'FID', 'OID', 'OBJECTID', 'Geometry', 'Shape',
            'Shape_Length', 'Shape_Area',   'Raster', 'Blob']
    good = [f for f in in_flds if f.editable and f.type not in bad]
    fld_dict = {f.name: f.type for f in good}
    fld_names = list(fld_dict.keys())
    null_dict = {f: nulls[fld_dict[f]] for f in fld_names}
    return null_dict, fld_names

 

So what happens when you try to use the shortcut _as_narray?

Row values in a list format...
with arcpy.da.SearchCursor(in_fc, '*', None, SR) as cursor:
    a = [row for row in cursor]
   
a
[(1, (300015.0, 5000005.0), 1, 1, 4, 1.0, 100.0, 'A 10 chars', 40.0, 100.0, datetime.datetime(2019, 3, 28, 0, 0), '2019/03/28 00:00:00'),
(2, (300005.0, 5000015.0), 2, None, None, None, None, None, 64.0, 64.0, None, None),
(3, (300010.4945054945, 5000010.593406593), 3, 3, 6, 3.0, 300.0, 'C not null', 99.41640786499875, 182.0, datetime.datetime(2019, 3, 30, 0, 0), '2019/03/30 00:00:00')]

 

Notice the integer columns have nulls in them as do the other columns.  Columns should have one data type.  We get too complacent because of spreadsheets.

 

To the rescue...

b = arcpy.da.FeatureClassToNumPyArray(
        in_fc, "*", "", spatial_reference=SR, explode_to_points=False,
        skip_nulls=False, null_value=null_dict)

b
array([(1, [ 300015.   , 5000005.   ], 1,    1,    4,  1., 100., 'A 10 chars', 40.   , 100., '2019-03-28T00:00:00.000000', '2019/03/28 00:00:00'),
       (2, [ 300005.   , 5000015.   ], 2, -999, -999, nan,  nan, 'None', 64.   ,  64., '2019-03-28T00:00:00.000000', 'None'),
       (3, [ 300010.495, 5000010.593], 3,    3,    6,  3., 300., 'C not null', 99.416, 182., '2019-03-30T00:00:00.000000', '2019/03/30 00:00:00')],
dtype=[('OBJECTID', '<i4'), ('Shape', '<f8', (2,)), ('Id', '<i4'), ('Long_1', '<i4'),
       ('Short_1', '<i4'), ('Float_1', '<f4'), ('Double_1', '<f8'), ('Text_1', '<U10'),
       ('Shape_Length', '<f8'), ('Shape_Area', '<f8'), ('Date_time', '<M8[us]'), ('DT_str', '<U20')])

Do the scrolly thing and you will notice that the null_dict that I created previously now converts integer <null> values to -999, Floating point values are tru NaN's and text is 'None'.

 

Lesson... don't use <null>, define a true null value that represents what is meant by it.  There can be more than one type of value to represent conditions like, no data collected, versus no data available, versus no observation possible... don't lump them into one category at the beginning.  You can aggregate AFTER data collection, you shouldn't do it before

Short one

 

Came up in a question.  I sadly suggested a spreadsheet.  To correct this, here is the numpy solution.

Normalizing data...

Here is the input and output tables

names = ['a', 'b', 'c', 'd']
a = arcpy.da.TableToNumPyArray(out_tbl, names)
a0 = a.view('f8').reshape(a.shape[0], len(names))
dt = [('a1', 'f8'), ('b1', 'f8'), ('c1', 'f8'), ('d1', 'f8')]
n = normalize(a0)
new_names = ['a1', 'b1', 'c1', 'd1']
out = np.zeros((n.shape[0],), dtype=dt)
for i, name in enumerate(new_names):
    out[name] = n[:, i]
arcpy.da.NumPyArrayToTable(out, out_tbl+"norm")
def normalize(a):
    # a is a (n x dimension) np.array
    tmp = a - np.min(a, axis=0)
    out = tmp / np.ptp(tmp, axis=0)
    return out

 

 

Line 1 and 2, read the table from ArcGIS Pro

Line 3, 'view' the array as a floating point numbers.

Line 4, create an output data type for sending it back

Line 5, normalize the data

Lines 6 to 10, bumpfh to send it back to Pro as a table

 

Normalize... hope I got it right... take the array, subtract the min then divide by the range.  np.ptp is the 'point-to-point' function which is the range

 

Normalize by row, column or overall

 

Now, lets assume that an input dataset could be data arranged by row, column or as a raster...  We need to change of normalize equation just a bit to see the results.

Header 1
   
# ---- Adding an axis parameter ----

def normalize(a, axis=None):
    # a is a (n x dimension) np.array
    tmp = a - np.min(a, axis=axis)
    out = tmp / np.ptp(tmp, axis=axis)
    return out

a = np.arange(25).reshape(5,5)   # ---- some data

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

normalize(a, axis=0)     # ---- normalize by column

array([[0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.25, 0.25, 0.25, 0.25, 0.25],
       [0.5 , 0.5 , 0.5 , 0.5 , 0.5 ],
       [0.75, 0.75, 0.75, 0.75, 0.75],
       [1.  , 1.  , 1.  , 1.  , 1.  ]])

normalize(a, axis=1)     # ---- normalize by row

array([[ 0.  , -0.25, -0.5 , -0.75, -1.  ],
       [ 0.31,  0.06, -0.19, -0.44, -0.69],
       [ 0.62,  0.38,  0.12, -0.12, -0.38],
       [ 0.94,  0.69,  0.44,  0.19, -0.06],
       [ 1.25,  1.  ,  0.75,  0.5 ,  0.25]])

normalize(a, axis=None)  # ---- normalize overall

array([[0.  , 0.04, 0.08, 0.12, 0.17],
       [0.21, 0.25, 0.29, 0.33, 0.38],
       [0.42, 0.46, 0.5 , 0.54, 0.58],
       [0.62, 0.67, 0.71, 0.75, 0.79],
       [0.83, 0.88, 0.92, 0.96, 1.  ]])

 

Lots of stuff you can do

Dan_Patterson

Geometry in NumPy... # 1

Posted by Dan_Patterson Champion Mar 16, 2019

Geometry 

 

By example

 

Start with a few points. The table and the graph below show what the arrangement is.

 

How was the table generated from the coordinates?

Every sequence needs an Id field, some x and y coordinates.  The rest of the columns were calculated.

I will exclude the id column from the example, since all the other fields are decimal numbers.

 

  1. Make an empty array to hold the results
  2. Add the x and y coordinates (xs, ys)
  3. Calculate the sequential differences in the coordinates (dx, dy)
  4. Determine the segment lengths between the points (leng)
  5. Calculate the cumulative length of the path formed by the points
  6.  Magic happens....

 

Sample Data Points...

 id  xs     ys     dx      dy     leng   cumleng   steps   deltaX   deltaY 
----------------------------------------------------------------------------
 000   0.00   0.00    0.00   0.00   0.00      0.00    0.00     0.00     0.00
 001   0.00  10.00    0.00  10.00  10.00     10.00    2.50     0.00     4.00
 002  10.00  10.00   10.00   0.00  10.00     20.00    2.50     4.00     0.00
 003  10.00   8.00    0.00  -2.00   2.00     22.00    0.50     0.00    -4.00
 004   2.00   8.00   -8.00   0.00   8.00     30.00    2.00    -4.00     0.00
 005   2.00   2.00    0.00  -6.00   6.00     36.00    1.50     0.00    -4.00
 006  10.00   2.00    8.00   0.00   8.00     44.00    2.00     4.00     0.00
 007  10.00   0.00    0.00  -2.00   2.00     46.00    0.50     0.00    -4.00
 008   0.00   0.00  -10.00   0.00  10.00     56.00    2.50    -4.00     0.00
# ---- Procedure... start with an array 'a'

a.shape  # (9, 2) the input array, it has 9 rows and 2 columns (ie x, y)

z = np.zeros((a.shape[0],9))                 # make the empty array
z[:, :2] = a                                 # xys : x, y coordinates
z[1:, 2:4] = dxdy = z[1:, :2] - z[:-1, :2]   # sequential differences
z[1:, 4] = np.sqrt(np.einsum('ij,ij->i', dxdy, dxdy))  # calculate distance
z[:, 5] = np.cumsum(z[:,4])                  # cumulative distance

 

The code so far

  1. Start with a basic input of 9 points consisting of x and y values... it has a 'shape' property reflecting this.
  2. Make a 'container' to hold our results in line 5... we will call it 'z'.  To facilitate this, I made a numpy array filled with zeros.  I would show it, but it is just an array with 9 rows ('a' is the points, and there are 9 of them) and 9 columns.
  3. The 6th line fills columns 0 and 1 with the values in 'a'... in other words, the x and y coordinates
  4. Stick with me... the 7th line fills in rows 1 to the end with two columns of data representing the sequential differences in the coordinates.  z[1: .... means from the first row on, … 2:4] … means fill the 2nd and 3rd column (aka, the 'from, upto but not including' syntax)
  5. Calculate the segment distances/lengths.  I use einsum (Einstein summation).  Its a long story, I have blogged about it elsewhere.  Trust me it is fast and works with in multiple dimension and allows you way more tools other than distance calculations
  6. Line 9 just does a cumulative summation of the distances.

 

 

What to do now?

Lots of things. 

 

Basic geometric operations

shift, rotate, scale, skew

Derived geometric properties

interpoint distance (done), angles/directions/bearing, bounding boxes and way more

More later

 

But today, lets

 

Densify the points

To add to the basic code, we are going to take a leap

The 7th column in the table above was labelled 'steps'.  That columns was derived because I wanted to 'densify' the number of points between the input points.  In this example, we will use a 'dist' of 4 units.  This is line 1 below.

 

The 8th and 9th column need some explaining.  I slipped that calculation in line 7 in the initial code.... dxdy

Between the first 2 points (0,0) and (0, 10), there is a 10 unit distance.  Divide that by 4 and the first densification will have 2.5 steps in distance terms.  In coordinate terms, we have a vertical line (right? check! so you understand).  Out increment is obviously not going to be the same in the X and Y direction is it?  The X coordinate will increment by 0 and the Y coordinate will increment by 4 (look at the 2nd row deltaX, deltaY columns).  This calculation is done in line 2 below.

 

Densify by distance
z[1:, 6] = steps = z[1:, 4] / dist           # steps to create
z[1:, 7:] = deltas = dxdy/(steps.reshape(-1, 1))
#
# ---- above, calculating the steps and densification
# ---- below, for later... calculating the new points
#
N = len(a) - 1  # number of segments
pnts = np.empty((N,), dtype='O')
for i in range(N):              # cycle through the segments and make
    num = np.arange(steps[i])   # the new points
    pnts[i] = np.array((num, num)).T * deltas[i] + a[i]
a0 = a[-1].reshape(1,-1)
final = np.concatenate((*pnts, a0), axis=0)

 

So what does this produce?  Out focus now turns to lines 7-13 above.

Sadly, we will be constructing arrays with 'jagged' shapes, object arrays, like geometry objects that don't have the same number of shapes, points or parts.  (more in another blog).  Basically line 8 above creates the container to save the results.  Each 'row' can contain an indeterminate number of objects and each row's objects need not be the same length.

 

Line 10, produces the number of steps from our previous calculation.  the np.arange function will truncate the decimal portion resulting in integer steps (ie steps = 2.5 results in

 

np.arange(2.5)  # ---- array([0., 1., 2.])

 

Line 11 produces the points.  Essentially line 11 is nothing more than the equation to calculate points along a line given a point and the line slope (or equivalent)

 

The zero in our arange, will be used to reproduce the first point, the 1 will be used to determine the position of the point 4 units up the Y axis and the 2 will be for the point up 8 units.  That is.... (0, 0), (0, 4), and (0, 8).  The last point between (0, 0) and (0, 10) will be created during the next sequence.

 

The points look like the following.

 

First sequence2nd3rd4th ... and so on
step (0) ...points
[[0. 0.]
 [0. 4.]
 [0. 8.]]

step (1) ...points
[[ 0. 10.]
 [ 4. 10.]
 [ 8. 10.]]

step (2) ...points
[[10. 10.]]
step (3) ...points
[[10.  8.]
 [ 6.  8.]]

 

The first 4 sequences show how the points are produced. 

We have already established that first sequence should produce 3 point, including the start point, two intermediate points but excluding the last. 

The 2nd sequence is similar, and the end point of the first sequence is also the first point of the second.

The 3rd sequence... it is too short for any intervening points (have a look, it has a length/distance of 2), so only the first point is produced.

... carry on if you need it using the following.

Produce your own results
for i, pnt in enumerate(pnts):
    print("\nstep ({}) ...points\n{}".format(i, pnt))

 

The final result is a sequence of points which represent the densified input pattern.

Points ... translated/rotated for viewing.
final.T
array([[ 0.,  0.,  0.,  0.,  4.,  8., 10., 10.,  6.,  2.,  2.,  2.,  6., 10., 10.,  6.,  2.,  0.],
       [ 0.,  4.,  8., 10., 10., 10., 10.,  8.,  8.,  8.,  4.,  2.,  2.,  2.,  0.,  0.,  0.,  0.]])

 

Pretty much it.

Next steps... just produce the points, produce a polyline using the points, subdivide the points into segments of fixed and/or variable lengths, add a Z dimension to the results... whatever..

Dan_Patterson

Measuring Distance in 3D

Posted by Dan_Patterson Champion Feb 21, 2019

3D  

 

Math check

A bump ride along the x-axis

x       [ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
y       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
z       [ 0.,  1.,  0.,  1.,  0.,  1.,  0.,  1.,  0.,  1.,  0.]
total length/distance travelled   14.142...
segment length/distance travelled [1.41, 1.41, 1.41, 1.41, 1.41, 1.41, 1.41, 1.41, 1.41, 1.41]

Conventional application

  • Take a DEM
  • Construct a 3D polyline (Z-enabled) representing a profile path along it
  • Densify it at some increment if you want to sample more elevation points along the line, but are too lazy to do it as you go (or you forgot).  This is optional, but there are builtin tools and/or 3rd party tools to do this (including some of mine)
  • Feature vertices to points, to get the path as points
  • Extract values to these points.
    • Extract Values to Points in the Spatial Analyst
    •  If you don't have SA, there are ways to get these data, but that is for another blog
  • Add Geometry Attributes to get the x, y and z values for the points.
  • Off to numpy

 

The Map and the Profile

Start low, go high, or vice versa.  Vertical exaggeration on the Z values 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Calculations

Now for some distance calculations.

 

Points

array([[ 300012.56, 5001013.9 ,       3.  ],
       [ 300025.53, 5001010.2 ,       6.  ],
       [ 300037.26, 5001003.4 ,       9.  ],
             ...,
       [ 300972.11, 5000037.68,     198.  ],
       [ 300993.72, 5000021.01,     201.  ],
       [ 301014.1 , 5000008.04,     204.  ]])

 

2D  Total length, sequential lengths
 -     1505.25,

 -     array([13.49, 13.56, 14.4 , 18.42, 17.69, 21.99, 17.66, 17.49, 21.29, 20.08,
                ... snip
                37.54, 42.86, 32.03, 33.18, 20.75, 25.46, 24.84, 18.03, 27.29, 24.15]))

3D  Total, sequential

 -      1519.95,

 -      array([13.82, 13.88, 14.71, 18.66, 17.94, 22.19, 17.91, 17.74, 21.5 , 20.31,
                ... snip

                 37.66, 42.96, 32.17, 33.32, 20.96, 25.64, 25.02, 18.28, 27.46, 24.34]))

 

Quick calculation

np.sqrt(204**2 + 1505.25**2)  => 1519.01  Not to be unexpected given the line's shape.

 

Calculations made using  ... e_leng ... from arraytools. The gist link allow people to experiment.

 

 

Unconventional application

 

How far did the turkey vulture travel before it landed on its food?

 

2D start-end point distance  - 56.39

 

2D  Total length, sequential lengths

 -  1590.97

 -  array([0.16, 0.16, 0.17, 0.18,  ...,  8.76, 8.79, 8.81, 8.84]))

 

3D  Total, sequential

 -  1658.26

 -  array([1.01, 1.01, 1.01, 1.02,  ..., 8.82, 8.85, 8.87, 8.89]))

 

Archimedes spiral in 3D

Task...

Identify the 2 highest elevation points in a series of polygons.

 

Purpose...

To win a bet amongst neighbors... to locate something like a tower... find observation points for visibility analysis... lots of reasons

 

Conceptualization...

  • Intersect the points with the polygons, retaining all attributes... output type... point
  • Add an X and Y field to the result so you have the point coordinates for later use (if you don't have them)
  • Delete any 'fluff' fields that you don't need, but keep the ID, KEY, OBJECTID fields from each. They will have a unique name for identification.
  • Split the result into groupings using the key fields associated with the polygons (see code)
  • Sort the groupings on a numeric field, like elevation (ascending/descending)     (code)
  • Slice the number that you need from each grouping.   (code)
  • Send it back out to a featureclass if you need it.    (code)

 

For Picture People...

A sample table resulting from the intersection of points with polygons.

 

The points (in red) and the polygons (a grid pattern) with the two points with the highest 'Norm' value in each polygon

 

An upclose look of the result

 

For the Pythonistas....

Fire up Spyder or your favorite python IDE

 

Table to NumPy Array....

>>>  a = arcpy.da.TableToNumPy("path to the featureclass or table", field_names="*")  # you now have an array

 

with me so far?  now you have an array

 

Make a script... call it something (splitter.py for example)

 

The work code... 

import numpy as np
import arcpy


def split_sort_slice(a, split_fld=None, val_fld=None, ascend=True, num=1):
    """Does stuff  Dan_Patterson@carleton.ca 2019-01-28
    """

    def _split_(a, fld):
        """split unsorted array"""
        out = []
        uni, idx = np.unique(a[fld], True)
        for _, j in enumerate(uni):
            key = (a[fld] == j)
            out.append(a[key])
        return out
    #
    err_0 = "The split_field {} isn't present in the array"
    if split_fld not in a.dtype.names:
        print(err_0.format(split_fld))
        return a
    subs = _split_(a, split_fld)
    ordered = []
    for i, sub in enumerate(subs):
        r = sub[np.argsort(sub, order=val_fld)]
        if not ascend:
            r = r[::-1]
        num = min(num, r.size)
        ordered.extend(r[:num])
    out = np.asarray(ordered)
    return out

# ---- Do this ----

in_fc = r"C:\path_to_your\Geodatabase.gdb\intersect_featureclass_name"

a = arcpy.da.TableToNumPyArray(in_fc, "*")

out = split_sort_slice(fn, split_fld='Grid_codes', val_fld='Norm', ascend=False, num=2)

out_fc = r"C:\path_to_your\Geodatabase.gdb\output_featureclass_name"

arcpy.da.NumPyArrayToFeatureClass(out, out_fc, ['Xs', 'Ys'], '2951')

 

Lines 40 - 42

NumPyArrayToFeatureClass—Data Access module | ArcGIS Desktop 

 

Open ArcGIS Pro, refresh your database and add the result to your map.

 

I will put this or a variant into...

 

Point Tools …


So that you can click away, for those that don't like to type.

Lines

 

Different incarnations and names

Pretty easy to form the origin-destination pairs.

Start at a point.

Throw in horizontal and/or vertical offsets.

A dash of an azimuth/bearing.

A tad of NumPy

A bit of Arcpy and....

A good way to spend some time, so you write it down because you will forget and reinvent it later.

Almost forgot...

 

There is always one student that thinks outside the box. 

Hmmmm could be a bonus here... I wonder if any of mine can replicate the compass with 10 degree increments?

In the attached code, I made these changes

    rads = np.deg2rad(bearing)
    dx = np.sin(rads) * dist
    dy = np.cos(rads) * dist
    #
    n = len(bearing)
    N = [N, n][n>1]  # either the number of lines or bearings

 

And used this

b = np.arange(0, 361, 22.5)
a, data =transect_lines(N=1, orig=[some x, some y],
                        dist=100, x_offset=0, y_offset=0,
                        bearing=b, as_ndarray=True)

You can't have it both ways in a manner of speaking.  By limiting N to number of bearings, you use numpy to generate the desired angles,.  There is no x or y offset since the origin is now fixed.

 

How to use the attached...

""---- use these as your inputs, with edits of course

# ---- make the x, y coordinate table
SR = 2951  # a projected coordinate system preferably

a, data =transect_lines(N=10, orig=[299000, 5000000], dist=100,
                        x_offset=10, y_offset=0, bearing=-10, as_ndarray=True)
p0 = r"C:\Your_path\Your.gdb\a_tbl"

arcpy.da.NumPyArrayToTable(a, p0)

# ---- now for the lines
p1 = r"C:\Your_path\Your.gdb\some_lines"

arcpy.XYToLine_management(p0, p1,
                          'X_from', 'Y_from',
                          'X_to', 'Y_to',
                          spatial_reference=SR)

"""

 

PS

The python/numpy part is quite speedy, using variants of

%timeit transect_lines(N=10, orig=[0,0], dist=1, x_offset=0, y_offset=0, bearing=0, as_ndarray=True)

That is microseconds for the speed geeks.  I couldn't see a use case to test for larger arrays.
N    Time             
    10     36.0 µs ± 309 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    50     39.3 µs ± 3.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  100     42.9 µs ± 6.57 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  500     46.5 µs ± 502 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1000     54.9 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I didn't bother to test the featureclass creation since I have no control over that.
Dan_Patterson

Point Tools ....

Posted by Dan_Patterson Champion Dec 29, 2018

Points

 

The foundation of most geometry.  Lots of tools for working with them, sort of.

This is a collection of tools for working with point data sets, creating point data sets or using points for other purposes.

I have lots of blogs on individual aspects of points.  This is just a link and an update notice for any additions to the point tools toolbox.

 

 

A fairly simple list.

The code access...

code sharing site

point tools on my github

 

Simple to use.  Just unzip the zip file from the code sharing site.  Keep the toolbox and the scripts folder in the same location so that the tbx and the scripts are kept 'relative' to one another.

GitHub if more for the terminally curious that just want to examine the code for their own purposes.

 

Some blog links as I think of them....

Distance Explorations... Trees are cool 

Concave Hulls ... the elusive container 

Standard Distance, inter-point distance:  ... the "Special Analyst" to the rescue 

Geometry... Stuff to do with points 

 

Anything you think should be added, fire me off an email.

C lone or not to clone

 

NOTE:     I will be updating with a new guide for ArcGIS Pro 2.4 when it is released (Beta 2 is now complete.

  I will provide the link here.

 

Why Clone?

  • If you want to install a package or upgrade an existing package.
    • You should be able to update or install packages distributed by esri within the existing environment... but you can't. I know it is confusing, but it may mean that the distributed packages haven't been fully tested (ie packaged for AI or machine learning)
  • If you aren't the guardian of your machine... you need to clone.
    • You are in the Star Wars clone category.

Why Not Clone?

  • You are the master of your machine and are familiar with software installation and 'conda' stuff.  If this is you, then you can install packages base ArcGIS Pro environment because you will have full admin rights.
  • TIP
    • Keep the *.exe download and/or the *.msi and *.cab files if you 'toast' something and need to do a reinstall.  This need hasn't happened with any of my cohort.

 

My approach

  • I completely uninstalled previous versions of Arc-Anything-and-Everything.
    •  Alternately, good time to buy a really good upgraded machine
  • Do a fresh new install of ArcGIS Pro.
  • Don't let the installation software decide where it wants to install.  Make a folder ahead of time (ie arc_pro or similar and install there... C:\Program Files.... is a really long path with spaces and I hate installing anything there that doesn't need to
  • Make a clone as describe below the dashed ===== line if you aren't in control of your machine

 

Result.... I couldn't install any packages through Pro's package manager, and when I installed Spyder via conda in my clone, it couldn't import arcpy

 

 

Solution...

  • I installed spyder via conda into the arcgispro-py3 env and now I have spyder working.
  •  I also installed other packages into that environment without issue.
  •  When you download, use save as to download the *.exe to a folder ... you want to keep this.
  • Run the *.exe you downloaded so you have the installer *.msi and *.cab files
  • Double-click on the *.msi file to begin the installation
  • Specify the folder where you want Pro to be installed
  • Run 'conda' via proenv.bat ( the python command prompt) and make sure your arcgispro-py3 is active and install away
  • Alternately, create your clone and try to get it working with your packages and arcpy

 

Packages updates....

So far I have made upgrades to

  • python 3.6.8 the last of the 3.6 line
  • numpy 1.16.2 the last of the 1.15 line
  • I upgraded other packages as well

 

Testing without installing

You can check the affect of package updates without actually installing them. I still recommend using this first, then check the list for possible conflicts issues.  Launch conda, then ...

 

conda install <your package name>  --dry-run

 

This is the dashed line... below is for cloning... the package installation is the same for both

==================================================

Clone... If you have to do it, here is a guide.  This guide is only for people which have actual control over their computers.

The Clone Guide

 

Access proenv.bat

 

You can launch proenv.bat via your windows start options under the guise of the Python Command Prompt.

 

I prefer to make a desktop shortcut as shown below.

 

Your environments can be controlled within ArcGIS Pro's package manager or via 'conda' accessed through proenv.bat.


Cloning from within Pro

It is slower and you don't get a lot of information, but they are improving it as they go along.  Activate the environment, close Pro, then restart with the new environment.

 

 

Working with conda

The shortcut brings up the command prompt in you active environment.  To obtain information on your environments, just run conda info --envs

 

Installing packages

You can add a package from within the package manager of via conda.  Since I prefer the --dry-run option in conda, I will illustrate it here.  You can leave out the --dry-run option to perform the actual install once you are sure you won't cause any foreseen issues.

 

 

Upgrading packages

You can upgrade a package either from the package manager in ArcGIS Pro or via conda.  The package manager seems to take longer and you don't get much feedback during the process.

Again, I prefer to examine an upgrade using the --dry-run option first, prior to committing.

 

 

You don't need this section

 

Proenv.bat window

Ok... love that blue?  Making conda package installs more fun... 

 

Anaconda Navigator

Now not everyone needs this nor can everyone do this, but with a patch on a single file, you can add an alternate package manager and access to a load of documentation links.

 

 

application launcher

 

 

the catch

In order to get the above, you have to edit a few lines in the 'conda_api.py' which will located in your clone path

C:\ArcGIS\bin\Python\envs\dan\Lib\site-packages\anaconda_navigator\api

 

The patch given by 

Patch Anaconda Navigator to use conda executable instead of root python wrapper to conda · GitHub 

entails altering a couple of lines in the conda-api.py file.  I made a copy of the original and made fixes to the other in case I needed to undo the changes quickly.  Not ideal, but worth it if you need to provided documentation and application shortcuts to users with diverse computing backgrounds.

Like I said... you don't need it, but it is a definite 'nice'.

Arcpy

 

You need to create a Point object

 

import arcpy

pnt = arcpy.Point(300000, 5025000)

print(pnt)

300000 5025000 NaN NaN

 

Simple... But what does that... import arcpy ...line do?

Let's see

 

(1) ---- Direct import

 ----------------------------------------------------------------------
| dir(arcpy) ...
|    <module 'arcpy' from 'C:\\ArcGISPro\\Resources\\ArcPy\\arcpy\\__init__.py'>
-------
  (001)    ASCII3DToFeatureClass_3d   ASCIIToRaster_conversion AcceptConnections                                       
  (004)    AddAngleDirectedLayout_un  . . . SNIP . . .

  (1174)    stats                     stpm                     sys                                                     
  (1177)    td                        time                     toolbox                                                 
  (1180)    topographic               un                       utils                                                   
  (1183)    warnings                  winreg                   wmx   

 

That's correct... about 1200 names added to python's namespace.

 

(2) ---- Alternatives?

 

arcgisscripting

# ---- arcgisscripting is located in the folder
# C:\{your_install_path}\bin\Python\envs\arcgispro-py3\Lib\site-packages\arcgisscripting
# In there there is an arcgisscripting.pyd file

import arcgisscripting as ags

dir(ags)
['ClearCredentials', 'ExecuteAbort', 'ExecuteError', 'ExecuteWarning', 'Extent',
'ImportCredentials', 'NumPyArrayToRaster', 'Raster', 'RasterToNumPyArray',
'SignInToPortal', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__',
'__name__', '__package__', '__path__', '__spec__', '_addTimeInterval', '_analyzeForSD',
'_arcgisscripting', '_attachLocator', '_chart', '_convertWebMapToMapDocument',
'_createGISServerConnectionFile', '_createGeocodeSDDraft', '_createMapSDDraft',
'_createimageservicesddraft', '_getImageEXIFProperties', '_getRasterKeyMetadata', '_getUTMFromLocation', '_hasLocalFunctionRasterImplementation', '_hgp', '_ia',
'_listDateTimeStringFormats', '_listStyleItems', '_listTimeZones', '_mapping',
'_reserved', '_setRasterKeyMetadata', '_sharing', '_ss', '_wrapLocalFunctionRaster',
'_wrapToolRaster', 'create', 'da', 'getmytoolboxespath', 'getsystemtoolboxespath',
'getsystemtoolboxespaths', 'na', 'un']

# ---- how about the *data access* stuff??

dir(ags.da)
['ContingentFieldValue', 'ContingentValue', 'DatabaseSequence', 'Describe', 'Domain',
'Editor', 'ExtendTable', 'FeatureClassToNumPyArray', 'InsertCursor',
'ListContingentValues', 'ListDatabaseSequences', 'ListDomains',
'ListFieldConflictFilters', 'ListReplicas', 'ListSubtypes', 'ListVersions',
'NumPyArrayToFeatureClass', 'NumPyArrayToTable', 'Replica', 'SearchCursor',
'TableToNumPyArray', 'UpdateCursor', 'Version', 'Walk', '__doc__', '__loader__',
'__name__', '__package__', '__spec__', '_internal_eq', '_internal_sd', '_internal_vb']

 

arcobjects

from arcpy.arcobjects import Point

 

dirr(Point)
----------------------------------------------------------------------
| dir(Point) ...
|    <class 'arcpy.arcobjects.arcobjects.Point'>
-------
  (001)    ID                M                 X                 Y                
  (005)    Z                 __class__         __cmp__           __delattr__      
  (009)    __dict__          __dir__           __doc__           __eq__           
  (013)    __format__        __ge__            __getattribute__  __gt__           
  (017)    __hash__          __init__          __init_subclass__ __le__           
  (021)    __lt__            __module__        __ne__            __new__          
  (025)    __reduce__        __reduce_ex__     __repr__          __setattr__      
  (029)    __sizeof__        __str__           __subclasshook__  __weakref__      
  (033)    _arc_object       _go               clone             contains         
  (037)    crosses           disjoint          equals                             
  (041)    overlaps          touches           within  

(3) ---- The data access module

Get less fluff when working with tables, and featureclasses.  Compare to the arcgisscripting  import... any differences?
dirr(arcpy.da, cols=3)
----------------------------------------------------------------------
| dir(arcpy.da) ...
|    <module 'arcpy.da' from 'C:\\ArcGISPro\\Resources\\ArcPy\\arcpy\\da.py'>
-------
  (001)    Describe                 Domain                   Editor                  
  (004)    ExtendTable              FeatureClassToNumPyArray InsertCursor            
  (007)    ListDomains              ListFieldConflictFilters ListReplicas            
  (010)    ListSubtypes             ListVersions             NumPyArrayToFeatureClass
  (013)    NumPyArrayToTable        Replica                  SearchCursor            
  (016)    TableToNumPyArray        UpdateCursor             Version                 
  (019)    Walk                     __all__                  __builtins__            
  (022)    __cached__               __doc__                  __file__                
  (025)    __loader__               __name__                 __package__             
  (028)    __spec__                 _internal_eq                                     
  (031)    _internal_sd             _internal_vb   

(4) ---- Arcobjects geometry

dirr(geo)
----------------------------------------------------------------------
| dir(arcpy.arcobjects.geometries) ...
|    <module 'arcpy.arcobjects.geometries' from 'C:\\ArcGISPro\\Resources\\ArcPy\\arcpy\\arcobjects\\geometries.py'>
-------
  (001)    Annotation    AsShape       Dimension     Geometry     
  (005)    Multipatch    Multipoint    PointGeometry Polygon      
  (009)    Polyline      __all__       __builtins__  __cached__   
  (013)    __doc__       __file__      __loader__    __name__     
  (017)    __package__   __spec__      basestring                 
  (021)    gp            operator      xrange    
but it import gp as well.

(5) ---- Arcobjects

import arcpy.arcobjects as arco
dirr(arco, cols=3)
----------------------------------------------------------------------
| dir(arcpy.arcobjects.arcobjects) ...
|    <module 'arcpy.arcobjects.arcobjects' from 'C:\\ArcGISPro\\Resources\\ArcPy\\arcpy\\arcobjects\\arcobjects.py'>
-------
  (001)    ArcSDESQLExecute               Array                          Cursor                        
  (004)    Extent                         FeatureSet                     Field                         
  (007)    FieldInfo                      FieldMap                       FieldMappings                 
  (010)    Filter                         GeoProcessor                   Geometry                      
  (013)    Index                          NetCDFFileProperties           Parameter                     
  (016)    Point                          RandomNumberGenerator          RecordSet                     
  (019)    Result                         Row                            Schema                        
  (022)    SpatialReference               Value                          ValueTable                    
  (025)    _BaseArcObject                 __builtins__                   __cached__                    
  (028)    __doc__                        __file__                       __loader__                    
  (031)    __name__                       __package__                    __spec__                      
  (034)    convertArcObjectToPythonObject mixins                         passthrough_attr 

(6) ---- environments perhaps?

from arcpy.geoprocessing import env
dirr(env, cols=3)
----------------------------------------------------------------------
| dir(<class 'arcpy.geoprocessing._base.GPEnvironments.<locals>.GPEnvironment'>) ...
|    np version
-------
  (001)    MDomain                        MResolution                    MTolerance                    
  (004)    S100FeatureCatalogueFile       XYDomain                       XYResolution                  
  (007)    XYTolerance                    ZDomain                        ZResolution                   
  (010)    ZTolerance                     __class__                      __delattr__                   
  (013)    __delitem__                    __dict__                       __dir__                       
  (016)    __doc__                        __eq__                         __format__                    
  (019)    __ge__                         __getattribute__               __getitem__                   
  (022)    __gt__                         __hash__                       __init__                      
  (025)    __init_subclass__              __iter__                       __le__                        
  (028)    __lt__                         __module__                     __ne__                        
  (031)    __new__                        __reduce__                     __reduce_ex__                 
  (034)    __repr__                       __setattr__                    __setitem__                   
  (037)    __sizeof__                     __str__                        __subclasshook__              
  (040)    __weakref__                    _environments                  _gp                           
  (043)    _refresh                       addOutputsToMap                autoCancelling                
  (046)    autoCommit                     baDataSource                   buildStatsAndRATForTempRaster 
  (049)    cartographicCoordinateSystem   cartographicPartitions         cellSize                      
  (052)    coincidentPoints               compression                    configKeyword                 
  (055)    extent                         geographicTransformations      isCancelled                   
  (058)    items                          iteritems                      keys                          
  (061)    maintainAttachments            maintainSpatialIndex           mask                          
  (064)    nodata                         outputCoordinateSystem         outputMFlag                   
  (067)    outputZFlag                    outputZValue                   overwriteOutput               
  (070)    packageWorkspace               parallelProcessingFactor       preserveGlobalIds             
  (073)    processingServer               processingServerPassword       processingServerUser          
  (076)    pyramid                        qualifiedFieldNames            randomGenerator               
  (079)    rasterStatistics               referenceScale                 resamplingMethod              
  (082)    scratchFolder                  scratchGDB                     scratchWorkspace              
  (085)    scriptWorkspace                snapRaster                     terrainMemoryUsage            
  (088)    tileSize                       tinSaveVersion                 transferDomains               
  (091)    transferGDBAttributeProperties values                         workspace   

(7) ---- Know your imports.

More to come

X  The xlrd and openpyxl modules packaged with ArcGIS Pro is a pretty cool for working with Excel files... if you are stuck and have to use them.  Pandas depends on them, so you could just use Pandas to do the data conversion, but, numpy can be called into play to do a pretty good and quick conversion while at the same time cleaning up the data on its way in to a table in ArcGIS Pro

------------------------------------------------------

(1) ---- Spreadsheets gone bad

Here is a spreadsheet with so flubs builtin.

 

Column A contains integers, but excel treats them as just floating point numbers without a decimal place.

 

Column D is just text but with leading/trailing spaces, cells with spaces, empty cells and just about anything that can go wrong when working with text.

 

Column E has floats but two of the cells are empty or worse... a space.

 

All these conditions need to be fixed.

 

As for fixing blank rows, missing column headers, data not in the upper left quadrant of a sheet, or data that share the page with other stuff (like graphs etc etc)… its not going to happen here.  This discussion assumes that you have an understanding on how you should work with spreadsheet data if you have to.

 

------------------------------------------------------

(2) ---- Spreadsheets to array

So with a tad of code, the spreadsheet can be converted to a numpy structured/recarray.

During the process, numeric fields which are obviously integer get cast to the correct format.

 

Malformed text fields/columns are cleaned up.  Leading/trailing spaces are removed and empty cells and/or those with nothing but spaces in them are replaced by 'None'.

 

Empty cells in numeric floating point fields are replaced with 'nan' (not a number).  Sadly there isn't an equivalent for integers, so you will either have to upcast your integer data or provide a null value yourself.

Best approach... provide your own null/nodata values

 

------------------------------------------------------

(3) ---- Spreadsheets to array to geodatabase table

Now... The array can be converted to a geodatabase table using NumPyArrayToTable.

 

arcpy.da.NumPyArrayAsTable(array, path)

 

where

`array` is from the previous step 

`path`  the full path and name of the geodatabase table.

 

it comes in as expected.  The dtypes are correct and the text column widths are as expected. Note that text column widths are twice the Unicode dtype width (ie U20 becomes 40 characters for field length)

 

------------------------------------------------------

(4) ---- Spreadsheets to geodatabase table via arcpy

Excel to Table does a good job on the data types, but it takes some liberty with the field length. This may be by design or useful or a pain, depending what you intend to do with the data subsequently.

 

You can even combine xlrd with arcpy to batch read multiple sheets at once using the code snippet in the reference link below.

 

 

 

 

 

 

 

(5) ---- Spreadsheets to Pandas to table

Yes pandas does it via xlrd, then the data to numpy arrays, then to series and dataframes, then out to geodatabase tables.  So you can skip pandas altogether if you want.

 

The data types can be a bit unexpected however, and there is no cleaning up of text fields isn't carried out completely, blank/empty cells are translated to 'nan' (not a number?) but a space in a cell remains as such.

The data type for the text column is an 'object' dtype which is usually reserved for ragged arrays (ie mixed length or data type).

 

df['Text_null'].tolist()
[' a leading space', 'b', 'a trailing space ', nan, ' ', 'back_to_good', '    four_leading', 'b', 'a', 'done']

 

 

(6) ---- The code

I put the code in the  link to my `gist` on GitHub in case code formatting on this site isn't fixed.

 

excel_np.py .... convert excel to a structured array

 

There are some things you can do to ensure a proper data type.  The following demonstrates how one little blank cell can foul up a whole column or row of data.

def isfloat(a):
    """float check"""
    try:
        i = float(a)
        return i
    except ValueError:
        return np.nan
   
# ---- Take some numbers... but you forgot a value so the cell is empty ie ''
#
vals = [6, 9, 1, 3, '', 2, 7, 6, 6, 9]

# ---- convert it to an array... we will keep it a surprise for now
#
ar = np.asarray(vals)

# ---- use the helper function `isfloat` to see if there are numbers there
#
np.array([isfloat(i) for i in ar])

# ---- yup! they were all numbers except for the blank
#
array([ 6.,  9.,  1.,  3., nan,  2.,  7.,  6.,  6.,  9.])

# ---- if we hadn't checked we would have ended up with strings
#
array(['6', '9', '1', '3', '', '2', '7', '6', '6', '9'], dtype='<U11')

If you really need to conserve the integer data type, they you will have to some hurdle jumping to check for `nan` (aka not-a-number)

# ---- our list of integers with a blank resulted in a float array
#
np.isnan(ar)  # --- conversion resulted in one `nan`
array([False, False, False, False,  True, False, False, False, False, False])

# ---- assign an appropriate integer nodata value
#
ar = ar[np.isnan(ar)] = -999

# ---- cast the array to integer and you are now done
#
ar = ar.astype('int')
array([   6,    9,    1,    3, -999,    2,    7,    6,    6,    9])

 

(7) ---- End notes...

So the next time you need to work with spreadsheets and hope that the magic of xlrd, openpyxl or pandas (which uses both) can solve all your problems.... take the time to look at your data carefully and decide if it is truly in the format you want BEFORE you bring it into ArcGIS Pro as a table

 

arr = excel_np("c:/temp/x.xlsx")

arr

array([('aaa', '01', 1), ('bbb', '02', 2)],
      dtype=[('field1', '<U4'), ('field2', '<U4'), ('field3', '<i4')])

import arcpy
arcpy.da.NumPyArrayToTable(arr, r"C:\Your_spaceless_path_to\a.gdb\x_arr")

An example for a very simple table

 

 

If you have any use cases where the standard conversion methods aren't good let me know.

 

 

References:

 

excel to table ...

xlrd on GitHub …

openpyxl on bitbucket... and openpyxl docs page...

Recently stumbled upon…          Using conda in spyder

 

I will just leave this as a thought.  It is one of those IPython %magic things

 

Update...       Spyder 3.3.6 is out  2019-07-16

 

Battle cry.... Install Spyder, Jupyter console and Jupyter notebook for ArcGIS Pro by default 

                      Make it so.... now on to the content

 

 Spyder... install once, use in multiple environments New... 2018-08-30

 

Table of contents:  (use browser back to return here)

 

 

:--------- Latest Version

    Version 3.3.6: installed 2019-07-16

    Use proenv.bat and just ran..... conda update spyder

 

:--------- Installing Spyder in ArcGIS Pro

arcgis-pro-your-conda-environments

Clone... ArcGIS Pro ... for non administrators 

 

:--------- Some other links

Script documenting ... It is all in the docs - links to spyder help pane for user-created help.

Spyder on GitHub ... If you want to keep abreast of issues and/or enhancement requests

Spyder Documentation …. On GitHub or Spyder Documentation

Spyder-notebook ... Jupyter notebook inside Spyder...

 

 

Spyder in pictures

:---- The Icon 

:----- The whole interface

... which can be arranged to suit

 

:---- Keep track of project files

 

:---- Need a little help?

 

:---- Fancy documentation with minimal effort

 

 

:---- Help for IPython?

 

 

:---- Help for other Modules?

 

 

:---- Check out your variables

 

 

:---- Some graphs? 

Yes from within Spyder, you can use Matplotlib or any other installed graphics pack (ie seaborn, orange etc)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

:---- See your scripts in outline mode,

Quickly navigate within them like a table of contents or use outline to get a basic shell listing of your script


 

The trick to outline is to use # ---- Your comment here   4 dashes seems to be the key for some reason

 

 

:---- Don't like something? 

Change it

 

: ----  Set Spyder as your IDE for PRO

 

 

 

: ---- Fear not...

You can even pretty up the interface

 

Making conda package installs more fun... 

 

 

 

More later

: --------

KD trees

 

Table of contents:  (use browser back to return here)

Distance stuff, lots of tree types.  Your distribution of Python comes with scipy which has a couple of implementations.

This is just a quick demo of its potential use.  And an homage to Canadian university students, for which KD will always have a special meaning as a primary food group. Kraft Dinner for the non-student

 

Start with some points and you then want to calculate the closest 2 points to form point origin-destination pairs... because it can be done.

Steps.

  • Just deal with the coordinates first, leave the attribute baggage to the side for now.
  • Decide on the number of points you want to find the 'closest' of.  Don't get ridiculous and ask for an origin-destination matrix with a couple of thousand points.  Go back to the project design stage or look at scipy.distance.cdist and a new computer.
  • Sorting the points by X, then Y coordinate is useful in some situations.  An option to do so is provided.
  • Building the KDTree is fairly straightforward using scipy.
    • decide on the number of points to find
    • the returned list of indices will include the origin point itself, so if you want the closest 2 points, then set your query to N = 3.  This can be exploited to suck up the x,y values to form origin-destination pairs if you want to form lines, and/or polygons.
  • Decide if you want to just pull out the indices of the closest pairs with their distance.
  • Optionally, you can produce a structured array, which you can then bring into ArcGIS Pro as a table for use with a couple of ArcToolbox tools to create geometry
  • You are done.  Do the join thing if you really need the attributes.

 

The picture:

 

The code:

So this function just requires a point array of x,y pairs, the number of closest points (N), whether you want to do an x,y sort first and finally, whether you want an output table suitable for use in ArcGIS Pro.

From there, you simply use arcpy.NumPyArrayToTable to produce a gdb featureclass table. 

You can them use... XY to Line … to produce line segments, connecting the various origins and destinations as you see fit, or just bask in the creation of an... XY event layer.

 

Note:  lines 32 and 41 can use... cKDTree ...in place of... KDTree ..., if you just need speed for Euclidean calculations.

def nn_kdtree(a, N=3, sorted=True, to_tbl=True):
    """Produce the N closest neighbours array with their distances using
    scipy.spatial.KDTree as an alternative to einsum.

    Parameters:
    -----------
    a : array
        Assumed to be an array of point objects for which `nearest` is needed.
    N : integer
        Number of neighbors to return.  Note: the point counts as 1, so N=3
        returns the closest 2 points, plus itself.
    sorted : boolean
        A nice option to facilitate things.  See `xy_sort`.  Its mini-version
        is included in this function.

    References:
    -----------
    `<https://stackoverflow.com/questions/52366421/how-to-do-n-d-distance-
    and-nearest-neighbor-calculations-on-numpy-arrays/52366706#52366706>`_.
   
    `<https://stackoverflow.com/questions/6931209/difference-between-scipy-
    spatial-kdtree-and-scipy-spatial-ckdtree/6931317#6931317>`_.
    """
    def _xy_sort_(a):
        """
mini xy_sort"""
        a_view = a.view(a.dtype.descr * a.shape[1])
        idx =np.argsort(a_view, axis=0, order=(a_view.dtype.names)).ravel()
        a = np.ascontiguousarray(a[idx])
        return a, idx
    #
    def xy_dist_headers(N):
        """Construct headers for the optional table output"""
        vals = np.repeat(np.arange(N), 2)
        names = ['X_{}', 'Y_{}']*N + ['d_{}']*(N-1)
        vals = (np.repeat(np.arange(N), 2)).tolist() + [i for i in range(1, N)]
        n = [names[i].format(vals[i]) for i in range(len(vals))]
        f = ['<f8']*N*2 + ['<f8']*(N-1)
        return list(zip(n,f))
    #   
    from scipy.spatial import cKDTree
    #
    idx_orig = []
    if sorted:
        a, idx_orig = _xy_sort_(a)
    # ---- query the tree for the N nearest neighbors and their distance
    t = cKDTree(a)
    dists, indices = t.query(a, N)
    if to_tbl:
        dt = xy_dist_headers(N)  # --- Format a structured array header
        xys = a[indices]
        new_shp = (xys.shape[0], np.prod(xys.shape[1:]))
        xys = xys.reshape(new_shp)
        ds = dists[:, 1:]  #[d[1:] for d in dists]
        arr = np.concatenate((xys, ds), axis=1)
        arr = arr.view(dtype=dt).squeeze()
        return arr
    else:
        return np.array(indices), idx_orig

 

The output

Just a slightly better formatting that you can get with one of my numpy functions... obviating the need for Pandas for table niceness.

 id  X_0    Y_0    X_1    Y_1    X_2    Y_2    d_1     d_2    
--------------------------------------------------------------
000   3.00  98.00  10.00  94.00  23.00  94.00    8.06   20.40
001  10.00  94.00   3.00  98.00  23.00  94.00    8.06   13.00
002  13.00  18.00  19.00  22.00  34.00  16.00    7.21   21.10
003  19.00  22.00  13.00  18.00  34.00  16.00    7.21   16.16
004  23.00  94.00  10.00  94.00   3.00  98.00   13.00   20.40
005  34.00  16.00  19.00  22.00  43.00   1.00   16.16   17.49
006  37.00  64.00  43.00  89.00  56.00  84.00   25.71   27.59
007  43.00   1.00  34.00  16.00  66.00   6.00   17.49   23.54
008  43.00  89.00  56.00  84.00  61.00  87.00   13.93   18.11
009  56.00  84.00  61.00  87.00  43.00  89.00    5.83   13.93
010  61.00  87.00  56.00  84.00  43.00  89.00    5.83   18.11
011  66.00   6.00  76.00  20.00  43.00   1.00   17.20   23.54
012  67.00  41.00  78.00  50.00  76.00  20.00   14.21   22.85
013  76.00  20.00  66.00   6.00  67.00  41.00   17.20   22.85
014  78.00  50.00  67.00  41.00  80.00  67.00   14.21   17.12
015  80.00  67.00  91.00  66.00  78.00  50.00   11.05   17.12
016  82.00  91.00  94.00  95.00  61.00  87.00   12.65   21.38
017  91.00  66.00  80.00  67.00  78.00  50.00   11.05   20.62
018  94.00  95.00  82.00  91.00  91.00  66.00   12.65   29.15
019  96.00  40.00  78.00  50.00  91.00  66.00   20.59   26.48

 

Summary

Behind the scenes, there should be some 'spatial cleanup'.  Specifically, if you look at the image you have point pairs connected by a single segment, that is because they are the closest to one another.  Rather than duplicating the segment with opposing directions, you can 'prune' the indices and remove those prior to producing the geometry.

 

There are lots of tools that you can produce/show geometric relationships.  Use them to provide answers to your questions.  This implementation will appear soon on the code sharing site.  I will provide a link soon.