Deep Learning Object Detection Advice

GarrettRSmith · ‎10-28-2024

Hello Everyone.

I am working on an object detection lab for a GIS class that I teach and am having issues with the process actually detecting objects.

I have followed a number of Esri videos on YouTube and read some of the blogs about the object detection workflow, but it seems that I might be missing something important in my own process, which I have outlined below.

DATA: 2023 USDA NAIP Aerial Imagery

One: Label Objects for Deep Learning (tool)
I originally started with 20 objects and have increased that number to 125.

Two: Export Training Data within the Label Objects for Deep Learning tool
I use the RCNN Masks option for the Meta Data Format

Three: Train Deep Learning Model (tool)
The MaskRCNN model type is automatically inserted and I used 100 Epochs.

Four: Detect Objects Using Deep Learning (tool)

I am including a screenshot of the most recent process.

As you can see from the above screenshot, out of 125 training samples it was only able to correctly identify five trees that I trained it to identify and one that was not part of the training dataset.

Is there a strategy to drawing the polygons around the objects that I might be missing in the training phase? I used this video as a reference:

https://www.youtube.com/watch?v=g0FDARaciiI

And the polygons around the boats seem to encompass both the boats and the surrounding water.

Anyhow, I want to show my class this cool technique, but would like it to be more robust than it is currently working.

Thanks for any help, and thank you for reading this post.

clai · ‎11-05-2024

When labelling the training sample, it is better to zoom into a small area and comprehensively label all your desired objects (e.g.individual trees in your case) without any missing . Then you only export the comprehsnively labelled area to your training data by setting extent in the environment tab. Train your own model with just the small area and then you can detect the trees of the entire image with the trained model. The key is to ensure you mark out all the desired objects for your training area.

You may reference to this blog https://www.esri.com/arcgis-blog/products/arcgis-pro/geoai/tips-for-labeling-images-for-object-detec...

ShivaniPathak · ‎11-13-2024

Hi @GarrettRSmith, I have few points which can be helpful for you.

From the image which you have shared I can see that the image has shrubs and trees both in the imagery which can confuse the model as visually they are very similar.
Secondly ,125 samples are very less for training a tree model. The suggestion by @clai for labeling all trees in an area and only exporting that area is also very important. If we label trees sparsely the model will get confused because few trees are labelled and few are missing.
If your objective is only to identify trees in this area using a deep learning model you can use our two pretrained models named "Tree Detection" and "Tree Segmentation" from ArcGIS Living Atlas of the World. Our pretrained model is trained on millions of data.
If your objective is to show your students how they can train a deep learning model you can refer to our learn lesson showing how palm trees can be detected using deep learning .

Please let me know if you need any other help.

ShivaniPathak · ‎11-13-2024

One key difference between traditional remote sensing classification approaches and deep learning models is that in traditional classification methods, we often rely on a small number of pure pixel samples (e.g., 20-30) to train the model. These methods typically assume that the spectral signature of each class is distinct and sufficient for classification. In contrast, deep learning models, particularly convolutional neural networks (CNNs), do not just learn from the spectral information but also capture higher-level features like shape, texture, and contextual relationships within the image. As a result, deep learning models require larger and more diverse datasets with labeled examples, as they need to learn more complex patterns and relationships that go beyond simple pixel-based information."