Select to view content in your preferred language

Match by "Largest Overlap" for Select Layer by Location

1522
7
01-02-2024 06:52 AM
Status: Open
Labels (1)
MErikReedAugusta
MVP Regular Contributor

The GP Analysis Tool, Spatial Join has a Match Option parameter of "Largest Overlap" that doesn't appear in the otherwise-identical list of Select Layer By Location.

It'd be nice if I could use that new toy across all other contexts where I'm doing things between two datasets based on their locations.

Right now, the selection-only operation seems to require the use of intermediate datasets, which is a bit undesirable—especially for the option that I'm more likely to reach for when I don't want other datasets being created.

7 Comments
DrewFlater
Status changed to: Needs Clarification

The Largest Overlap option was added to Spatial Join only, as Spatial Join does a one by one operation for each target feature, where each target feature is examined to determine if any join features have a spatial relationship with that target feature. If many join features have the spatial relationship with the one target feature, Largest Overlap joins the join feature that has the largest overlap to the target feature. 

Thinking about Select Layer By Location is different. Any input feature gets selected if it has the spatial relationship with any selecting feature. There is no handling about what if an input feature overlaps multiple selecting features, or if multiple input features overlap the same selecting feature.

Imagine this case, where a green feature overlaps a big orange feature (id=1) and a small orange feature (id=2).

DrewFlater_0-1718742476405.png

 

  • If the green feature is the input and the orange features are the selecting feature, a Largest Overlap relationship doesn't make sense - the green feature is going to get selected because it overlaps any orange features.
  • If the orange features are the input features and the green feature is the selecting feature, first orange 1 is evaluated for a spatial relationship with green (it overlaps) then orange 2 is evaluate for a spatial relationship with green (it overlaps) -- so both orange would get selected. There isn't a mode of select layer by location that would unselect orange 2 because orange 1 has a larger overlap with a single green feature. 

If you can graphically describe what your expectation is between two feature layers and how a selection could be performed based on a Largest Overlap relationship, please share the details to get this issue reopened for votes/consideration.

jbcypreste

@DrewFlater I don't think this is a good example on why there is no Largest Overlap option for select by attributes. If the spatial relation between two layers doesn't indicate the need for Largest Overlap on select by attributes, following this logic it shouldn't exist for spatial join either, which is obviously absurd. It also wouldn't make any sense to use the Largest Overlap operation in this example with either layer.

I think what @MErikReedAugusta is trying to point out is, select by location solves tasks with fewer steps. Doing select by location followed by a quick dropdown menu click on the attributes menu is much faster than doing a spatial join, then having to correct the input field with the target field value, besides the fact that spatial join itself is already a more time-consuming task than select by location.

If I have a layer of parks and another of districts, considering parks can overlap multiple districts even though they are usually smaller than districts, a largest overlap operation would be totally fitting for this situation. Specially when dealing with geometry errors like small offsets and overlaps. Plus, select by location already executes very complex calculations, so I don't think adding the largest overlap function would hurt.

DrewFlater

@jbcypreste Spatial Join is about summarizing/transfering attributes from a join layer to a target layer based on spatial relationship of those two layers. Select By Location is about selecting features in an input layer that have a certain spatial relationship with features in the selecting layer. Spatial Join handles 1:m relationships where one target shares a spatial relationship with many join features, and you can either summarize the m attributes into one value, or get the value from the join feature that is the closest or has the most overlap.

So in your parks and districts example, which layer is the input (the one that gets selection applied) and which is the selecting layer? For a spatial join it's likely you want something like the district ID that overlaps the most with the park to be transferred to each park. What is the scenario for Select By Location? The parks layer likely has dozens of parks features even for a small town and maybe a few districts, and maybe some of the parks overlap multiple districts. If the parks are the input and the districts are the selecting features, all parks are going to get selected because they all overlap districts. If you only have one park and you want to select which district overlaps it the most, that could be useful and you would set the districts as the input and the 1 selected park as the selecting features. But when there are multiple parks it is likely every district is going to get selected since each of them will likely be the district that overlaps with at least one park. Select By Location only adds a selection it doesn't in this example add the information about which district has the largest overlap with each park.  I am not sure what the expectation is here, which is why this idea is still in Needs Clarification status.

MErikReedAugusta

Somehow I missed the request for clarification (or forgot to respond to it) a year ago.  So, to clarify intent:

Spatial Join creates a new feature class to hold the results of that join.  In joining whole tables and needing to account for 1:M, maybe that's the most efficient route, and maybe it's not.

But I'm not always working with whole-table joins, for one.  And even when I am, spatial join isn't always the most efficient route.

So let's look first at a basic hypothetical scenario:

  • Data Source
    • FC_Green is the base/receiving feature class, as shown in @DrewFlater 's diagram, above.
    • FC_Orange is the selecting feature class, also as shown in the diagram above.
  • Goals
    • Capture the value in "OrangeField" on the largest-overlapping FC_Orange polygon, and write it to the "GreenField" field on the original FC_Green feature class.
  • Option 1:
    1. Run Spatial Query on the two feature classes, with a setting of "Largest Overlap"
      • Produces a third feature class that we'll call FC_SpatialJoin
    2. Perform a table join on FC_Green pointing at FC_SpatialJoin, based on some unique ID field that existed in FC_Green; let's go with "GlobalID", for simplicity.
    3. Run Field Calculator, writing FC_SpatialJoin.OrangeField into FC_Green.GreenField
    4. Remove the table join
    5. Delete FC_SpatialJoin

In this case, we're doing a relatively simple operation on the entire feature class using just the tools available in the GUI & Geoprocessing Pane, without anything special.  In this case, "Largest Overlap" doesn't make sense for Select Layer by Location, because sorting through the results would be significantly longer, more complicated, and less likely to be accomplishable solely by GUI-level interactions, as compared to just working with the interim, temporary dataset of FC_SpatialJoin.

 

But there have been occasions where automating a much more complex operation entailed iterating through FC_Green one feature at a time, performing some sort of spatial selection followed by other calculations that don't work well when faced with the whole dataset, and then writing a final value.

In those cases, I'm creating a dummy feature class in the form of FC_SpatialJoin, just so I can do another join to get the data back out, and I'm doing it for every feature.  Even compared to using the "memory" workspace, Select Layer by Location often feels faster and less wasteful in those cases.

 

If there's genuine need, I could possibly go spelunking in the operations I've run over the last year or two to find something that I could reduce down to a minimally-reproducible example.  But at the moment, I feel like the "use case" requested largely revolves around running low-overhead sub operations involving one or maybe two records at a time, where "Largest Overlap" could make sense.  Because I agree with you that it makes no real sense on a whole-table basis.

I think it's also vitally important to remember that Spatial Join creates a new feature class.  So whatever you're doing, you now have to negotiate three feature classes, where originally there were two.  That's at least an extra step of overhead.

MErikReedAugusta

Also, an addendum that feels distinct enough to warrant its own comment (and possibly its own Idea):

"Largest overlap" as an option seems to imply "Smallest overlap" also existing, but it doesn't, that I can see.  Right now, there's no fast & simple way to get there if you have more than two overlapping features in FC_Orange, other than the sorts of longhand, from-scratch ways we would've arrived at the "Largest overlap" before this option came along.

I'm positing it here, first, because "Invert Relationship" is an option for Select Layer by Location, but not for Spatial Join.  If you have only two features in FC_Orange, you could just invert "Largest overlap".

More than two overlapping features, though, wouldn't matter which context you were in.  Inverting will get you too many, so you have to do it the longhand way, anyway.

DrewFlater
Status changed to: Open