Iterative Select by Location: Return the record with the most features in radius

JohnDye · ‎04-14-2014

I need to figure out how to determine which feature in a dataset has the most features within a defined radius.

I'm thinking I need to loop through the entire dataset, perform a Select Layer by Locaiton function, load the result ID and number of features into a list, and continue iterating only updating the list if another feature comes along with a higher number of features.

I'm open to ideas on how to best accomplish this.

RichardFairhurst · ‎04-14-2014

I need to figure out how to determine which feature in a dataset has the most features within a defined radius.

I'm thinking I need to loop through the entire dataset, perform a Select Layer by Locaiton function, load the result ID and number of features into a list, and continue iterating only updating the list if another feature comes along with a higher number of features.

I'm open to ideas on how to best accomplish this.

A geoprocessing approach is to:

1. Buffer the features (if the ObjectID is your unique ID field, make sure the buffer preserved that field value or calculate your own copy of the ObjectID value into a long field before processing the Buffer).

2. Spatial Join the other features to the buffers using the One to One option.

3. Get the Max Frequency field value for the Spatial Join output using the Summary Statistics tool.

4. Join the Frequency of the Spatial Join to the Summary Statistics Max_Frequency field and select for the records where the ObjectID of the joined summary is not Null.

5. One or more features will be selected with the maximum count.

6. You can decide what you want to do if more than one feature holds the same highest value. If you want all of the records with the same maximum count, export the selection and use it to join back to the original features on the preserved unique ID in the buffers. If you want the min or max ID feature only, resummarize the selection of step 5 for the min or max original ObjectID value and join that output back to the original data. If all you want is a list, the export or resummary is the list you want.

7. Optionally reselect whatever you joined in step 6 to your original data by selecting all records where the ObjectID of the joined table is not Null. Transfer data or do any other calculation you want on the records with the maximum count of the other features.

Spatial Join will be much faster than Select by Locations done one feature at a time.

AmyKlug · ‎04-14-2014

you can also use the Generate Near Table tool and use the "count" statistic with summary statistic

http://resources.arcgis.com/en/help/main/10.1/index.html#//00080000001n000000

JohnDye · ‎04-15-2014

Hi guys,
All good suggestions. I should have made it clear in my prior post that I know how to do this from a standard geoprocessing standpoint. I'm trying to figure out a way to do in a much much faster way.

In my smallest market, there are over 400 retailers that have to be analyzed. In order to get the retailer with the largest number of features within it's search radius, I have to analyze each feature in the dataset on its own.

From there, once I know which retailer has the most retailers within its search radius, I have to copy all of those selected retailers to a new dataset, pretty simple, delete them from the original dataset, also simple, then iterate the process all over again to determine which retailer remaining in the dataset returns the next highest amount of retailers in its search radius. Then keep iterating until there are no more retailers left in the original dataset.

Here's what I'm doing currently just to try and get the first retailer with the most retailers in its search radius.

>>> inFeatureLayer = "Retailers"
... ScoreField = "PTS"
... SearchRadius = ".5 Miles"
... # Determine what the highest set of values is in the FeatureLayer
... maxValue = arcpy.SearchCursor(inFeatureLayer, "", "", "", ScoreField + " D").next().getValue(ScoreField)
... # Select the records which have the max value in the dataset
... arcpy.SelectLayerByAttribute_management(inFeatureLayer, "NEW_SELECTION", '"' + ScoreField + '" = ' + str(maxValue))
... # Get a list of the StoreIDs of the currently selected records
... StoreList = list(r[0] for r in arcpy.da.SearchCursor(inFeatureLayer, "STOREID"))
... # Establish a variable to hold the StoreID of the record with the most features intersecting its radius
... MaxRecord = ""
... HighestSelected = 0
... # Iterate through the StoreList
... for Store in StoreList:
...     print "Evaluating Store " + str(Store) + "..."
...     # Create a new selection, selecting the current store in the iteration
...     arcpy.SelectLayerByAttribute_management(inFeatureLayer, "NEW_SELECTION", '"STOREID" = ' + "'" + str(Store) + "'")
...     # Select the Stores within search radius of the current Store
...     arcpy.SelectLayerByLocation_management(inFeatureLayer, "WITHIN_A_DISTANCE", inFeatureLayer, SearchRadius, "NEW_SELECTION")
...     # Create a Describe Object of the Stores Layer
...     desc = arcpy.Describe(inFeatureLayer)
...     # Get the Number of Selected Records
...     SelectCount = len(desc.FIDSet)
...     # If the SelectCount is greater than the current HighestSelected Count
...     if SelectCount > HighestSelected:
...         print "Returned " + str(SelectCount) + " stores within " + SearchRadius + " of Store " + str(Store) + " which is greater than the current Selection Count of " + str(HighestSelected) + " for Store " + str(MaxRecord)
...         HighestSelected = SelectCount
...         MaxRecord = Store
...     # Otherwise
...     else:
...         print "Returned " + str(SelectCount) + " stores within " + SearchRadius + " of Store " + str(Store) + " which is less than the current Selection Count of " + str(HighestSelected) + " for Store " + str(MaxRecord)
...         # Do Nothing and Continue
...         pass
... print "StoreID: " + str(MaxRecord) + " was the record with the most features within " + SearchRadius + " with a total count of " + str(HighestSelected) + "."

It takes a really long time though because it has to analyze each feature seperately. I'm looking for a way to make this move much quicker.

RichardFairhurst · ‎04-15-2014

Search By Location is a very expensive operation as far as processing time and should be avoided in favor of the Spatial Join or Near Table tool approach. I'd iterate through the Spatial Join output instead of doing thousands of unneeded Search by Location operations, since one Spatial Join encompasses all of the Search By Locations and Count operations in a single pass of the tool.

You could set the cursor up with a sort expression so that the features of the Spatial Join are sorted on the Frequency field (descending) and unique ID (ascending). That way you can accomplish the equivalent of performing the entire set of Search by Location operations you need without repetition using just one cursor pass through the Spatial Join table. You just have to search by Attribute on the Unique ID in the Spatial Join back to the original to do the export (much less expensive than a Select By Location).

If you are going to do this with cursors, employ a dictionary to do the same data collection operation done by the Spatial Join to avoid multiple cursor passes on the same data. Duplicate cursor passes on the same data is extremely expensive, since each pass hits the hard disk, compared to a dictionary which just accesses memory after the initial cursor read operation. Optimization tricks like these are key to making these kinds of operations far out perform scripts that essentially just replicate a typical model builder iterator approach.

ChrisSnyder · ‎04-15-2014

I like your idea, but Richard is correct: the GenerateNearTable tool should be faster than a recursive SelectByLocation. But... If you were to do it as a SelectByLocation it might look something like this (untested code!):

storesFC = r"C:\temp\stores.shp"
trackingDict = {}
maxCount = 0
maxCountStoreId = 0
arcpy.MakeFeatureLayer_managment(storesFC, "stores_fl")
searchRows = arcpy.da.SearchCursor(storesFC, ["OID@", "SHAPE@"])
for searchRow in searchRows:
   storeOidValue, storeGeomObj = searchRow[0:2]
   arcpy.SelectLayerByLocation_management(storeGeomObj, "WITHIN_A_DISTANCE", "stores_fl", "10 MILES", "NEW_SELECTION")
   selectedStoresOidList = [r[0] for r in arcpy.da.SearchCursor("stores_fl", ["OID@"])]
   trackingDict[storeOidValue] = selectedStoresOidList
   if len(trackingDict[storeOidValue]) > maxCount:
      maxCount = len(trackingDict[storeOidValue]) 
      maxCountStoreId = storeOidValue    
del searchRow, searchRows
print "Store OID with the largest number of other stores close by is: " + str(maxCountStoreId) + " (" + str(maxCount) + " other stores close by)"