Calculate max number of non-overlapping polygons

JeffThomasILM · ‎05-13-2017

There is a proposed ordinance that would allow an activity on a property so long as it is a specified minimum distance (parcel edge to parcel edge) from another property with the activity (let's say 500 ft). I need to figure out the theoretical maximum number of properties that could allow this activity. I have all the eligible parcels selected (which is tens of thousands). I could put a half-distance (250 ft) buffer on every one of them, which of course would result in massive overlap. But if there's a way to strategically eliminate buffers to get maximum coverage with no overlap, that would be exactly what I need! Or is there a better, totally different approach to this? Thank you for any insight!

XanderBakker · ‎05-16-2017

Nice job! Thanks for sharing.

My solution does not provide the "best" solution, since not all possible combinations are evaluated. Also, since it determined the next parcel to be included based on the minimum increment of the buffered zone, it is likely to select the smaller parcels. The real situation might be much more complex as you mentioned.

JoshuaBixby · ‎05-17-2017

After posting my comment yesterday, I realized there was a LOT of room for optimizing the simulations since the polygons and buffer distances do not change during the runs. Since spatial operations are more expensive, often considerably more, than other mathematical operations; identifying all of the spatial interactions first and caching them allows for a spatial analysis to be conducted without using spatial operators. The efficiency of Python sets can be used instead of relying on geoprocessing tools or ArcPy Geometry methods.

I ran the following code against the Redlands, CA zoning data. Instead of picking a region of Redlands, I used the entire city.

# Setup parameters
fc = # path to feature class or shape file
buffer_dist = 500 # buffer distance in units of feature class projection
subset_sql = "CATEGORY = 'Single Family Residential'" # SQL WHERE clause defining subset of units to analyze
num_runs = 100000 # number of scenarios to model

# Buffer and spatial join to identify and cache spatial interactions between
# adjacent/neighboring units
subset = arcpy.MakeFeatureLayer_management(fc, "subset", subset_sql)
subset_buff = arcpy.Buffer_analysis(subset,
                                    "in_memory/buff",
                                    buffer_dist)
subset_rel = arcpy.SpatialJoin_analysis(subset,
                                        subset_buff,
                                        "in_memory/rel",
                                        "JOIN_ONE_TO_MANY",
                                        "KEEP_COMMON")

buff_rel = defaultdict(set)
with arcpy.da.SearchCursor(subset_rel, ["TARGET_FID", "ORIG_FID"]) as cur:
    for target, orig in cur:
        buff_rel[target].add(orig)
        
arcpy.Delete_management(subset_buff)
arcpy.Delete_management(subset_rel)

# Retrieve a list of unique/object identifiers
arcpy.SelectLayerByAttribute_management(subset, "SWITCH_SELECTION")
fids = [int(fid)
        for fid
        in arcpy.Describe(subset).FIDset.split(";")]
arcpy.SelectLayerByAttribute_management(subset, "CLEAR_SELECTION")

# Run simulations/scenarios against the data set
results = set()
for _ in range(num_runs):
    shuffle(fids)
    sel_set = {fids[0]}
    for fid in fids:
        if sel_set.isdisjoint(buff_rel[fid]):
            sel_set.add(fid)
    results.add(frozenset(sel_set))

# Create a size distribution of results
count_hundred = Counter(len(run) for run in results)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Pre-caching the spatial interactions worked better than I expected. I was able to run 100,000 scenarios on the 857 Single Family Residential polygons in around 3 minutes. This approach was several orders of magnitude faster than my original approach.

A breakdown of the results looks like:

Given the number of analysis units, all 100,000 scenarios were unique. Also, there was a single run with 168 polygons identified, the maximum. The maximum is represented by 1 of 100,000 scenarios considered.

JeffThomasILM · ‎11-05-2018

Joshua,

I've returned to this issue lately; it's kind of the opposite approach to a project that led to another thread I've started (Show distinct tuples regardless of column order). Xander's script does exactly what I need, but it's quite slow as you've noted. I was trying to use your script, as I understand it's much more efficient, but it doesn't return spatial result. While the graph of max possible features is interesting, I need to show a map of one possible outcome (any outcome is fine) of randomly selected parcels meeting the minimum spacing. Is it possible to output a map-able result from your script? Or, alter the logic used in your script to create an output like Xander's? I continue to look at the data made by the "rel" and "buff" variables in your script, seeing if there's something I can map. Thank you!

JoshuaBixby · ‎11-05-2018

Jeff, I responded to your other thread with an ArcPy solution for finding unique tuples. I will revisit this thread sometime this week and get back with you.