Get to the Points... arcpy, numpy, pandas

Blog Post created by Dan_Patterson Champion on Oct 15, 2017

Let's get to the point... in this case, the points of a square.


We can convert it to a structured array using nothing but arcpy methods, a mix of arcpy and numpy and more recently, and extension into Pandas.


Here is what they look like using various functions.


1.  Rudimentary steps

in_fc = r"C:\Git_Dan\a_Data\testdata.gdb\square"
flds = ['SHAPE@X', 'SHAPE@Y']
cur = arcpy.da.SearchCursor(in_fc, flds, None, None, True, (None, None))
a = cur._as_narray()

# ----- the reveal ----
array([(342000.0, 5022000.0), (342000.0, 5023000.0), (343000.0, 5023000.0),
       (343000.0, 5022000.0),  (342000.0, 5022000.0)],
      dtype=[('SHAPE@X', '<f8'), ('SHAPE@Y', '<f8')])


We can also do it directly using another arcpy.da interface and it yields the same results perhaps there is a direct link between the cursor's _as_narray and FeatureClassToNumPyArray, details are buried within a *.pyd.  Nonetheless, the journey reaches the same destination.

flds = ['SHAPE@X', 'SHAPE@Y']
arcpy.da.FeatureClassToNumPyArray(in_fc, field_names=flds, explode_to_points=True)

array([(342000.0, 5022000.0), (342000.0, 5023000.0), (343000.0, 5023000.0),
       (343000.0, 5022000.0), (342000.00000000186, 5022000.0)],
      dtype=[('SHAPE@X', '<f8'), ('SHAPE@Y', '<f8')])


In its absolute simplest form, the searchcursor simply needs a list of fields and a flag to convert a polygon to points.  The _as_narray method handles conversion to a numpy recarray with a specified dtype (2 floating point numbers) and a shape ( (5,) ) indicating that there are 5 pairs of numbers forming the polygon



2.  To an ndarray with just the coordinates

The _xy yields a simpler form...


    array([[  342000.,  5022000.],
           [  342000.,  5023000.],
           [  343000.,  5023000.],
           [  343000.,  5022000.],
           [  342000.,  5022000.]])
... because it uses arcpy's searchcursor and the _as_narray method to speedily get the geometry (lines 5-8).  At this point, the array's data type is converted (as a 'view') to 64-bit floating point numbers and reshaped to 5 rows of 2 points each.
def _xy(in_fc):
    """Convert featureclass geometry (in_fc) to a simple 2D point array.
    :  See _xyID if you need id values.
    flds = ['SHAPE@X', 'SHAPE@Y']
    args = [in_fc, flds, None, None, True, (None, None)]
    cur = arcpy.da.SearchCursor(*args)
    a = cur._as_narray()
    N = len(a)
    a = a.view(dtype='float64')
    a = a.reshape(N, 2)
    del cur
    return a


Working with either form of the returned array has its advantages and disadvantages.  I won't cover those here, but it is worthy to note, that I regularly use both forms of array.



3.  Throw in the IDs

Now, we can go back to working with a structured array if we need to maintain different data types in our columns.  This can be accomplished using _xyID which returns the same point values with the addition of the identification number of the polygon that it came with (ie. polygon with objectid 1).  Rather than the cumbersome datatype field descriptions that we got with _xy, I decided to provide simpler names by specifying a dtype whose names were different.  This can be done as long as you don't try to alter the underlying data type (more on this later).


def _xyID(in_fc, to_pnts=True):
    """Convert featureclass geometry (in_fc) to a simple 2D structured array
    :  with ID, X, Y values. Optionally convert to points, otherwise centroid.
    flds = ['OID@', 'SHAPE@X', 'SHAPE@Y']
    args = [in_fc, flds, None, None, to_pnts, (None, None)]
    cur = arcpy.da.SearchCursor(*args)
    a = cur._as_narray()
    a.dtype = [('IDs', '<i4'), ('Xs', '<f8'), ('Ys', '<f8')]
    del cur
    return a

# ---- yields ----
array([(1, 342000.0, 5022000.0), (1, 342000.0, 5023000.0), (1, 343000.0, 5023000.0),
       (1, 343000.0, 5022000.0), (1, 342000.0, 5022000.0)],
      dtype=[('IDs', '<i4'), ('Xs', '<f8'), ('Ys', '<f8')])




4.  How about some attributes

Yes, but what about the attributes.  Lets get fancy and use _ndarray which will pull in all the data from the table as well.


def _ndarray(in_fc, to_pnts=True, flds=None, SR=None):
    """Convert featureclass geometry (in_fc) to a structured ndarray including
    :  options to select fields and specify a spatial reference.
    : in_fc - input featureclass
    : to_pnts - True, convert the shape to points. False, centroid returned.
    : flds - '*' for all, others: 'Shape',  ['SHAPE@X', 'SHAPE@Y'], or specify
    if flds is None:
        flds = "*"
    if SR is None:
        desc = arcpy.da.Describe(in_fc)
        SR = desc['spatialReference']
    args = [in_fc, flds, None, SR, to_pnts, (None, None)]
    cur = arcpy.da.SearchCursor(*args)
    a = cur._as_narray()
    del cur
    return a

array([ (1, [342000.0, 5022000.0], 342000.0, 5022000.0, 343000.0, 5023000.0, 4000.0, 1000000.0),
       (1, [342000.0, 5023000.0], 342000.0, 5022000.0, 343000.0, 5023000.0, 4000.0, 1000000.0),
       (1, [343000.0, 5023000.0], 342000.0, 5022000.0, 343000.0, 5023000.0, 4000.0, 1000000.0),
       (1, [343000.0, 5022000.0], 342000.0, 5022000.0, 343000.0, 5023000.0, 4000.0, 1000000.0),
       (1, [342000.0, 5022000.0], 342000.0, 5022000.0, 343000.0, 5023000.0, 4000.0, 1000000.0)],
      dtype=[('OBJECTID', '<i4'), ('Shape', '<f8', (2,)), ('EXT_MIN_X', '<f8'),
             ('EXT_MIN_Y', '<f8'), ('EXT_MAX_X', '<f8'), ('EXT_MAX_Y', '<f8'),
             ('Shape_Length', '<f8'), ('Shape_Area', '<f8')])


I suppose that you cleverly noticed that the other attributes are replicated during the process of converting the polygon to points.  This can also be accomplished with arcpy.da.FeatureClassToNumPyArray simply by using '*' to use all fields rather than prescribed ones. 



5.  Got to have both, but not together

Working with an array which has attributes and point geometry AND the attributes are replicated is a tad of a pain, so it is time to split the attributes from the geometry.


def _two_arrays(in_fc, both=True, split=True):
    """Send to a numpy structured/array and split it into a geometry array
    :  and an attribute array.  They can be joined back later if needed.
    a = _xyID(in_fc, to_pnts=True)  # just the geometry
    shp_fld, oid_fld, SR, shp_type = fc_info(in_fc)
    dt_a = [('IDs', '<i4'), ('Xs', '<f8'), ('Ys', '<f8')]  # option 1
    dt_b = [('IDs', '<i4'), ('Xc', '<f8'), ('Yc', '<f8')]
    a.dtype = dt_a
    b = None
    if split:
        ids = np.unique(a['IDs'])
        w = np.where(np.diff(a['IDs']))[0] + 1
        a = np.split(a, w)
        a = np.array([[ids[i], a[i][['Xs', 'Ys']]] for i in range(len(ids))])
    if both:
        b = _ndarray(in_fc, to_pnts=False)
        b.dtype = dt_b
    return a, b

(array([[1, array([(342000.0, 5022000.0), (342000.0, 5023000.0),
          (343000.0, 5023000.0), (343000.0, 5022000.0),
          (342000.0, 5022000.0)],
        dtype=[('Xs', '<f8'), ('Ys', '<f8')])]], dtype=object),
array([ (1, 342500.0, 5022500.0, 342000.0, 5022000.0, 343000.0, 5023000.0,
            4000.0, 1000000.0)],
        dtype=[('IDs', '<i4'), ('Xc', '<f8'), ('Yc', '<f8'), ('EXT_MIN_X', '<f8'),
                  ('EXT_MIN_Y', '<f8'), ('EXT_MAX_X', '<f8'), ('EXT_MAX_Y', '<f8'),
                  ('Shape_Length', '<f8'), ('Shape_Area', '<f8')]))

This incarnation of getting at the data produces an 'object array' which is an array that can contain records of different size and/or composition.  Why?  Simple!  If you are converting polygons to point objects, then it is highly unlikely that you will have the same number of points per polygon.  You have two options then... collapse and rearrange the structure of the outputs so that all the coordinates are arranged in their respective columns or keep the inherent structure the data.  This is exemplified by the outputs returned by FeatureClassToNumPyArray and the above function.  Lines 24-27 represent the object array which contains an index number then an array of point coordinates with a specified dtype.  Lines 28-32 is the attribute array for that polygon.  As a nice side bonus, the point coordinates have been reduced to the polygon centroid value and not returned as individual points with repetitive attributes as noted previously.  Two for the price of one.



6.  What's this new thing


Then there is the new SpatialDataFrame and the tie in to Pandas rather than numpy.  It offers some advantages in terms of analysis but in some other areas not so much.  To get the simple polygon out to points you can do the following.


from arcgis import SpatialDataFrame as SDF

a = SDF.from_featureclass(in_fc)

p = a.SHAPE.tolist()[0]

[[[342000, 5022000], [342000, 5023000], [343000, 5023000], [343000, 5022000], [342000, 5022000]]]


There are many options available to get to the root geometry of your geometry and the ability to work with arcpy, numpy and now Pandas opens a variety of opportunities for improving your analytical arsenal.


Next blog will focus on the analysis side and how the options can be used to your advantage.