SelectLayerByLocation Alternatives?

JamesCrandall · ‎09-29-2017

Being a bit greedy here, I know but has anyone come up with more efficient/faster alternatives to arcpy.SelectLayerByLocation_management?

I'd say for the vast majority (ie. 99%) implementing this is entirely acceptable. However, I've run into a couple of instances lately that require several of these calls to be made in order to build a list/set of results that ultimately return an expected response.

...or in another instance, there are a large amount of point features (85 million) that need to be identified as COMPLETELY_WITHIN a simple single polygon feature. In most cases it's sub 30 seconds to return the expected result.

However if these processes are to be published as synchronous GP services, the processing times can be excessive in that context.

numpy? dictionaries? pandas? Anything at all that you may have tried to exceed the performance of arcpy.SelectLayerByLocation_management???

Case / scenario: select by location on points within a simple polygon feature and return a yes or no if the count is > 0

JamesCrandall · ‎10-02-2017

Great work Dan -- I'll take what I can and implement something now. Following the document too.

DanPatterson_Retired · ‎10-02-2017

Not going to bother with the Pandas option since the 'arcgis' module, which uses pandas in its spatialdataframe construct, essentially uses searchcursors (at some level). I can't imagine geopandas being any faster.

I think the problem with working with coordinates is the ability to vectorize the read/write process. cursors don't as your know.

As a side note, I tried to speed up the point-in-extent using numba and found no meaningful speed improvement until you got into the millions of points.... but a percentage of something fast isn't worth pursuing either

I have played around with full point in polygon searches (several standard approaches exist), just got to find my notes. In some cases (not many), the point-in-extent seemed to simply/prune the candidate tree before doing a full check on any particular point in polygon. But in your case, it may be useful to cut out the useless locations first, then do Sel By Att on the remainders. Report if you try it. However, going by my rule... 1 million of anything done in under a second ... is good enough, that's why there is coffee

DanPatterson_Retired · ‎10-01-2017

Recent document I have started and will add to Point in .... Analysis

JamesCrandall · ‎10-03-2017

It's not clear to me how I will efficiently set the "pnts" array from the point feature class containing the 85million features. arcpy.da.FeatureClassToNumPyArray is very slow.

DanPatterson_Retired · ‎10-03-2017

Ahhh two tricks

dump attributes when working with geometry.
If you work a lot with the geometry at least save the featureclass geometry to a *.npy file if you are going to use it time and again. Loading native numpy binary data is zippy zippy.

If you examine how the 'arcgis' module handles getting fc's out to arrays (panda, spatialdataframe or whatever) it is basically a cursor... so until that is speeded up files that large are just going to be slow. I haven't been able to understand the difference since arcpy (and its cloaked dll's) are a black box.

If you dump the attributes, things get faster (as long as you keep track of an index like pandas), so read only the geometry... do the stuff... then reconnect with the attributes after.

ChrisSnyder · ‎10-02-2017

Depending on what you are going to do with the results, 'Generate Near Table' might be a faster solution or even 'Intersect'. Both of these of course write an output dataset, so....

Another approach might be to break the geometry of your polygon into several simpler geometries. For example, a large rectangle and then several smaller triangles.

As I recall, the ESRI algorithm uses this search hierarchy:

1. Bounding rectangle

2. Convex hull

3. Full geometry

JamesCrandall · ‎10-03-2017

Thank you for the idea!

The Generate Near Table was not as efficient compared to the SelectByLocation method.

ChrisSnyder · ‎10-03-2017

Like bixb0012 said, anything that you can do to get all the base datasets on the local machine (and off a network connection) would certainly be a big boost. Might there be z values in your point layer? If so, I bet dropping the z values would speed the read performance a bit. Hands down, all these web services are way faster with local data that is stripped down to the bare minimum.

A casual armchair observation that .shp format generally remains faster to read/write that FGDB...

I hate to say it, by most of ESRI's spatial search (select by location, etc. ) and overlay algorithms (union, etc.) are in fact "the best in the biz". Not all though...

JamesCrandall · ‎10-03-2017

Ultimately, the FGDB is local to the ArcGIS Server that it is registered to. I will check on the z-values.

Thanks for your input!

JamesCrandall · ‎10-03-2017

Edit: added clip_analysis to generate the final polygons to use in SelectLayerByLocation process.

I think the best solution I've tested so far is to break up the polygon into individual polygons and then iterating over each polygon to perform the SelectbyLocation on the point features, checking to see if any features fall inside, then break out of that loop if that's the case.

Pretty straight forward:

1. CreateFishnet_management of the polygon feature.

2. FeatureToPolygon_management of the output Fishnet

3. Search cursor loop over each polygon in the fishnet

4. SelectLayerByLocation and perform test to count result.

5. If count > 0 then break and return with Boolean TRUE

What's interesting and the challenge is that the combination of rows/cols for the fishnet vs. the overall size of the input polygon geometry will determine the efficiency of the overall process. That is, if it's a large input polygon and too many rows/cols are created for the fishnet, then it will take longer to loop, but it also takes less time to SelectLayerByLocation.

There's a direct correlation between the content of the fishnet vs. the processing time of the SelectLayerByLocation. That is, if the faster it finds points within one of the fishnet polygons, the faster the overall process.

This beats any straight-up SelectLayerByLocation call against the full extent of the original point / polygon feature classes!

        arcpy.MakeFeatureLayer_management(pnt_fc, "PointFeatures")
        extent = polygon.extent

        originXY = extent.lowerLeft
        yaxisXY = extent.upperLeft
        opp_corner =extent.upperRight

        print "origin: " + str(originXY)
        print "yaxis: " + str(yaxisXY)
        print "opposite corner: " + str(opp_corner)

        #keep the cell width/height set to zero as it will use the opp_corner to figure it out
        cellWidth = 0
        cellHeight = 0

        #plug in the #rows/#cols you want the output grids to contain
        nrows = 4
        ncols = 4
        outFC = r'in_memory\outsections_line'
        outFishnet = r'in_memory\outsections' 
        outClip = r'in_memory\clipfeatures'
        arcpy.CreateFishnet_management(outFC, str(originXY), str(yaxisXY), cellWidth, cellHeight, nrows, ncols, str(opp_corner))
        arcpy.FeatureToPolygon_management(outFC, outFishnet)
        arcpy.Clip_analysis(outFishnet, polygon, outClip)

        with arcpy.da.SearchCursor(outClip, ['SHAPE@']) as outcur:
            for outrow in outcur:

                tshape = outrow[0]

                arcpy.SelectLayerByLocation_management("PointFeatures", "INTERSECT", tshape)
                rowcounter = int(arcpy.GetCount_management("PointFeatures").getOutput(0))

                if rowcounter > 0:
                    print "rowcounter: " + str(rowcounter)
                    return True
                    break

                else:
                    print "rowcounter: " + str(rowcounter)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍