Training Deep Learning Model Workflow misunderstanding

Molality · ‎11-23-2023

I am trying to see if I can train a model and detect objects within an aerial image.

I have followed some of the tutorials and went through some of the deep learning courses, and wanted to try on my own using Ohio imagery and I figured ponds would be an easy/fun thing to do.

I can't get it to work correctly, and I believe it is due to me not understanding the requirements or the parameters fully, but I can’t find further information, or I’m missing a simple step.

My PC specs:

CPU i9 13900KS 3.2 GHz

128 ram ddr5 4.2 GHz

RTX 4090 24GB

ArcPro 3.2 and appropriate deep learning libraries, got help from tech support setting it up correctly.

Objective: create polygons around all ponds in imagery sample.

Imagery: Lorain County OSIP III 6IN MrSID RGB 20x - 2017 (13.2 GB) from OGRIP Data Downloads (ohio.gov)

What I've done:

Label Objects for Deep Learning-

I've digitized 291 features. (maybe this is not enough?)

Export Training Data:

Input raster: I’ve tried using the .sid, now I’ve converted the .sid to a .tif

No additional input raster

Output folder: default

Input feature: shapefile from the labeled objects

Class Value Field: blank

Buffer Radius: 0

Input Mask Poly: blank

Image Format: Tiff

Tile Size X/Y: I’ve tried 256, 512, and 576 (from my understanding, the tile size should be large enough to contain the object within it, but it is okay if a few objects are bigger than the tile size. I also couldn’t find any literature on tile size limitations, is 1024 okay if your pc can run it?

Stride: I’ve tried 0, 8, 288, my understanding is that stride helps with limited training and that the stride value of half the tile size will have a 50% overlap.

Rotation Angle: I’ve tried 45, and zero.

Reference System: Map space

Metadata Format: I’ve tried PASCAL Visual Object Classes and RCNN Masks

Train Deep Learning Model

Input Training Data: the exported training data

Output Folder: Default

Max Epochs: 50

Pre-trained Model: Blank

Model Type: MaskRCNN

Batch size: 64

Validation %: 10

Backbone Model: Blank

Monitor Metric: Validation Loss

Stop when Model stops improving: tried unchecked and checked

Freeze Model: tried unchecked and checked.

Detect Objects using deep learning:

Never got one model to give proper results.

I’ve ran through other tutorials and got them to work, but I’m really struggling to understand what I’m doing wrong when making my own model. Happy Friendsgiving! and Thank you for your time.

PavanYadav · ‎11-27-2023

@Molality

Your workflow appears right to me.

How many samples do you have in your training data? Also, in the Train Deep Learning tool, what is the Chip Size set to? By default it's set to 224. Please match it with your Tile Size.

Common practice is to choose a tile size that is large enough to capture the entire object of interest, while also providing enough context for accurate detection. When objects size varies a lot in your samples, one of the approaches can be to use three times the average object size. I have read some research papers on tile size and they appear to be specific to different use cases. I believe you can use 1024 size if it's too big for your GPU, you can try smaller Batch Size.

Cheers!

Pavan

Pavan Yadav
Product Engineer at Esri
AI for Imagery
Connect with me on LinkedIn!
Contact Esri Support Services