# Point in... analysis

629
0
09-30-2017 05:54 PM

## Point in... analysis

Some basic exploration into the generic point in... style questions.

A recent comment about Select By Location taking a long time when point sets were large got me thinking.  The background on this topic generally leave one staring a reams of code with little to no explanation on the logic.  I will continue with that trend .

First the code.  I use python and numpy to do a lot of the work and we will use arcpy to communicate with ArcMap or ArcGIS Pro.

I created a featureclass in a file geodatabase of points, and a polygon representing a selection polygon.  The following script was used to generate the data prior to testing the code.  The sample was small for now, 10,000 points and one polygon (square).  More on scaling at a later time.

Explore.

``# -*- coding: UTF-8 -*-""":Script:   point_in_rect:Author:   Dan.Patterson@carleton.ca:Purpose::  To determine whether points are within the extents of polygons.::References::  http://stackoverflow.com/questions/30481577/:       assign-numpy-array-of-points-to-a-2d-square-grid:  http://stackoverflow.com/questions/33051244/:       numpy-filter-points-within-bounding-box:  https://stackoverflow.com/questions/33051244/:       numpy-filter-points-within-bounding-box/33051576#33051576:"""# ----10| ------20| ------30| ------40| ------50| ------60| ------70| ------80|import numpy as npimport arcpynp.set_printoptions(edgeitems=5, linewidth=80, precision=4,                    suppress=True, threshold=20)def contains(pnts, ext, in_out=True):    """Performs a logical_and to find points within a box/ext    :Requires:    :--------    :  pnts - an array of points    :  ext - the extent of the rectangle being tested as an array of the    :        left bottom (LB) and upper right (RT) coordinates    :  in_out - boolean, whether to return inside and outside points    :    :Notes:    :-----    :  comp - np.logical_and( great-eq LB, less RT)  # logic    :  inside - np.where(np.prod(comp, axis=1) == 1) # logic    :  case - comp returns [True, False] so you take the product    :  idx_in - indices derived using where since case will be 0 or 1    :  inside - slice the pnts using idx_in    """    outside = None    LB, RT = ext    comp = np.logical_and((LB <= pnts), (pnts <= RT))    case = comp[:, 0] * comp[:, 1]    idx_in = np.where(case)    inside = pnts[idx_in]    if in_out:        idx_out = np.where(~case)  # invert case        outside = pnts[idx_out]    return inside, outsideif __name__ == "__main__":    """make some points for testing, create and extent,    """    ext = np.array([[342000, 5022000], [343000, 5023000]])    in_fc = r'C:\Git_Dan\a_Data\testdata.gdb\xy_10k'    SR = arcpy.SpatialReference(2951)    a = arcpy.da.FeatureClassToNumPyArray(in_fc,                                          ['SHAPE@X', 'SHAPE@Y'],                                          spatial_reference=SR)    a0 = a.view(dtype=np.float).reshape(len(a), 2)    arcpy.env.overwriteOutput = True    arcpy.env.workspace = r'C:\Git_Dan\a_Data\testdata.gdb'    arcpy.MakeFeatureLayer_management('xy_10k', 'xy_lyr')    #arcpy.MakeFeatureLayer_management('subpoly', 'subpoly_lyr')    arcpy.management.SelectLayerByLocation("xy_lyr", "INTERSECT", "subpoly",                                           None, "NEW_SELECTION", "NOT_INVERT")    matchcount = int(arcpy.GetCount_management('xy_lyr'))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍``

The results are shown in the figure below. In this example, the script was run as it was, then the IPython magic %timeit was used to time just the portions that didn't require data preparation.

In both cases, 401 points were returned from a featureclass of 10,000 points.  The timing results are below.

Timing Results

(1)  Numpy
Timing contains with the array created from the points.
``%timeit contains(a0, ext, in_out=False)‍195 µs ± 11.1 µs per loop (mean Â± std. dev. of 7 runs, 1000 loops each)‍‍‍‍‍‍‍‍‍‍‍‍``

(2)  ArcGIS Pro and SelectByLocation
Just timing the final SelectLayerByLocation
``%timeit arcpy.management.SelectLayerByLocation("xy_lyr", "INTERSECT", "subpoly", None, "NEW_SELECTION", "NOT_INVERT")‍308 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)‍‍‍‍‍‍‍‍‍‍``
(3)  arcpy in ArcGIS Pro
In this incarnation, I use the same files and extract the geometry from them and perform a point 'within' polygon.  The following shows the appropriate additions.
``# imports as in previous scriptdef src_geom(pnt_fc, poly_fc):    """ a search cursor approach    """    pnts = [p for p in arcpy.da.SearchCursor(pnt_fc, "SHAPE@")]    polys = [pl for pl in arcpy.da.SearchCursor(poly_fc, "SHAPE@")]    poly = polys    is_in = [pnt.within(poly) for pnt in pnts]    return pnts, poly, is_inif __name__ == "__main__":    """searchcursor, arcpy point within polygon approach    """    arcpy.env.overwriteOutput = True    arcpy.env.workspace = r'C:\Git_Dan\a_Data\testdata.gdb'    pnt_fc = r'C:\Git_Dan\a_Data\testdata.gdb\xy_10k'    # the point layer as before    poly_fc = r'C:\Git_Dan\a_Data\testdata.gdb\subpoly'  # a single polygon    # ----     pnts, poly, is_in = src_geom(pnt_fc, poly_fc)    pnts_in = sum(is_in)    # ----‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍``

The result yielded the same number of points ( ie pnts_in = 401)

``%timeit src_geom(pnt_fc, poly_fc)472 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)‍‍‍‍‍‍‍‍‍‍``

(4)  MatPlotLib and Path.contains_point

MatPlotLib's 'path' module can be used to implement point in polygon testing.  Since it is available within the Anaconda suite, then it is included here for comparison purposes.

``from matplotlib.path import Pathpip = Path.contains_point  # ---- short form of method ----def _extent(ext):    """construct the extent rectangle from the extent points which are the    :  lower left and upper right points [LB, RT]    """    LB, RT = ext    L, B = LB    R, T = RT    box = [LB, [L, T], RT, [R, B], LB]    ext_rect = np.array(box)    return ext_rectdef containsmpl(pnts, ext):    """matplotlib incarnation    """    poly = _extent(ext)    poly = matplotlib.path.Path(poly)    is_in = [pip(poly, p) for p in pnts]    return np.array(is_in)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍``

Using the numpy array objects for the points and the polygon, as inputs, the results are as follows:

``%timeit containsmpl(a, ext)24.7 ms ± 761 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)‍‍``

I think it uses the 'crossing number' method of testing which would be better for general polygons, something which has to be added to the pure numpy approach, after the selection of the points in the extent has been completed.

To come...

(4)  pandas in ArcGIS
(5)  Other methods to assess 'containment'

Results so far...   In all cases, the result was 401 points within the one polygon.

Numpy                        0.195 ms  (195 microseconds)

MatPlotLib                24.7 ms

SelectByLocation    308 ms

arcpy                       472 ms

Version history
Revision #:
1 of 1
Last update:
‎09-30-2017 05:54 PM
Updated by: Contributors