Select to view content in your preferred language

Spatial Analysis Identifying Repeated Vegetation Enquiries

240
5
3 weeks ago
Fenland
Emerging Contributor

Identifying Repeated Vegetation Enquiries Over Time

Context

I’m working with a dataset of over 80,000 vegetation-related enquiries submitted by the public over a 10-year period. Each enquiry is represented as a point with the following attributes:

  • Easting
  • Northing
  • Logged Date (with time)
  • Enquiry Subject Name

Objective

My goal is to identify repeated enquiries occurring at least twice over the time period with at least a year between enquiries. I want to find all repeat enquires to get a sense of the overall picture and then filter/reduce the overall number down to just a single enquiry to get a sense of how many I'm dealing with.

This will help reveal patterns of recurring issues (e.g., persistent vegetation problems) and allow me to filter down to unique cases for further analysis.

Challenges Encountered

  1. Duplicate Submissions

Some locations have multiple identical points due to users clicking repeatedly when submitting an enquiry. In some cases, this results in 6+ identical points.

Solution:
I used Excel to remove duplicates based on a combination of:

  • Enquiry Subject Name
  • Logged Date/Time
  • Easting
  • Northing
  1. Inconsistent Terminology & Location Accuracy

Because the data is submitted by the public:

  • Terminology varies e.g., “Tree Overgrown/Untidy” vs “Vegetation Overhanging” may refer to the same issue.
  • Location accuracy varies points may not align precisely with assets, leading to spatial scatter. See screenshot below.
 
Spoiler
 

Methods Used – Image below.

Method 1: Buffer-Based Spatial Join (Blue Points)

  1. Removed records with null coordinates, subject name, or date.
  2. Clipped points to the area of interest.
  3. Created a 10m buffer around each point.
  4. Performed a spatial join between the buffer layer and the original point layer (1-to-many relationship).
  5. Dissolved results based on:
    • Enquiry Subject Name (from join)
    • Easting
    • Northing

This method helped group nearby points with similar subjects but missed some cases.

Method 2: Python-Based Temporal Filter (Red Points)

  1. Same initial cleaning and clipping steps.
  2. Performed a spatial join.
  3. Used a Python script to identify repeated enquiries:
    • Matching Easting, Northing, and Subject Name
    • Logged more than 1 year apart

This method captured some cases missed by Method 1 but also failed to group certain most overlapping points.

Visual Comparison

As shown in the image below, red points (Method 2) and blue points (Method 1) don’t always overlap. Each method captures different aspects of the problem, and neither is fully comprehensive.

 

Next Steps

Looking for recommendations for improvements and next steps, I could append the 2 datasets and remove the intersecting points but want to look at mistakes I have made or alternative approaches which could highlight repeated enquires I am missing.

0 Kudos
5 Replies
Fenland
Emerging Contributor

Don't have permission to upload screenshots mentioned in the post unfortunately. 

0 Kudos
Fenland
Emerging Contributor
0 Kudos
AustinAverill
Frequent Contributor

A bit cumbersome at first, but using a combination of both methods would likely provide the best results.

 

In theory, you would use a library like ArcPy or geopandas to create a spatial query based on a buffer for each individual record, then use similar logic applied in your existing python script to systematically identify duplicates within the spatial constraint.

Once you have the data set at a reasonable state, it would be wise to take and automate this process to run somewhat often. In this case, you only need to query and compare records within your listed time constraint (the past year) instead of trying to evaluate against the entire 80,000 record dataset. This would help the process be less resource intensive moving into the future.

 

Finally, if possible you should look to try to implement some sort of constraints, options, etc into the collection method that would allow you to more easily query and evaluate records. This would like like having a dedicated field for complaint type ('Foliage Overburden', 'Dead Tree', etc.) and then another field that allows users to type out comments or descriptions for that complaint type. 

DanPatterson
MVP Esteemed Contributor

Find Identical (Data Management)—ArcGIS Pro | Documentation

can be used to identify duplicates in attributes and even geometry or some combination thereof.

An overview of the Generalization toolset—ArcGIS Pro | Documentation using

    Aggregate Points (Cartography)—ArcGIS Pro | Documentation

might be worth a look as well


... sort of retired...
Fenland
Emerging Contributor

I did intially use the find Idenitical tool like you described but it doesn't find all the records for some reason, thats why I ended up using excel in the end. 

Thanks for linking to the aggregrate points tool, will try using the free aggregrate points tool as I don't have access to the cartography one. 

0 Kudos