Hi all, I've been doing experiments with the deep learning tools available through the Image Analyst extension. Specifically, I'm testing out the tools for land cover classification, using 30m Landsat imagery (6 spectral bands). The classification I'm testing is a 10 class classification (various types of forest, cropland, urban areas, water, vegetated wetlands, etc.).
In the past ~2 weeks, I've been trying to systematically test parameterization, and I'm documenting the results of my model runs (tracking relevant input parameters, accuracies, run times, and whether the output land cover maps look decent based on expert judgement). So far, I've discovered that the DeepLabV3 model type seems to be a non-starter (i.e., bad results), compared to the U-Net. Among other things, I also discovered that using the 256 pixel x 256 pixel default tile width wasn't appropriate for my training data.
I've been working through different backbone models. When I tested the AutoDL function, I found that it only runs the various Resnet models, but doesn't test other ones, like the various Mobilenet models. I wanted to know if anyone in the community has suggestions for backbone models they've used with Landsat or other satellite imagery that work, as I'm all ears. Thanks in advance!
Hi,
You are correct 256 pixel x 256 pixel is not appropriate for training a Land cover classification model using Landsat. You should try with 400x400 pixel or 512x512 pixel as this will cover more area and give a context to the model to learn.
By DeepLabV3 you mean MMSegmentation with DeepLabV3 model?
Our pretrained model for Land Cover Classification Landsat 8 is trained on UnetClassifier. If your LULC classes are same as this model you can also finetune the pretrained model with your data.
You can also try training DeepLab with pretrained "prithvi" backbone which is trained on multispectral imageries.
Hi Shivani,
Thank you for the prompt response. (For whatever reason, I didn't get an email notification that someone had responded to my post.)
1. By DeepLabV3, I mean the "DeepLabV3" model that's listed under "Model Type" in the "Train Deep Learning Model" tool (see attached screenshot).
2. I was also trying to follow your suggestion re: the Prithvi backbone model, but I don't see it when I select "Backbone Model" (and I have tried various pixel classification models). I am running ArcGIS Pro 3.3, with the deep learning packages that I recently grabbed from the GitHub page (https://github.com/Esri/deep-learning-frameworks).
3. I was considering trying your pretrained model, but alas, I am using a customized set of classes for my study area, but I might end up looking into your pretrained model at some point.
4. Regarding the tile width, I have not yet tried your suggestion of 400 x 400 or 512 x 512, but when I do use 256 x 256 pixels, the output has mostly one class, and I think that has to do with my input training polygons, which are no wider than 1km. When I reduce the tile width to 32 x 32, I get outputs that look feasible. Should I be adjusting my tile size to essentially capture "pure" training samples? My experiment with the 32 pixel x 32 pixel tiles seems to be indicating so. Thoughts?
Emil
Hi Emil,
I apologize for the oversight. The Prithvi backbone in DeepLab has been incorporated for ArcGIS Pro 3.4 which will be released in Oct - Nov. For Pro 3.3, you can train the HRNet model using MMSegmentation. Please refer to the screenshot below.
Regarding the 4th point:
"I haven’t yet tested your suggestion of using tile sizes of 400 x 400 or 512 x 512. When I use 256 x 256 pixels, the output mainly shows one class, which I suspect is related to my input training polygons, none of which exceed 1 km in width. However, when I reduce the tile size to 32 x 32, the outputs seem more viable. Should I adjust my tile size to ensure I capture 'pure' training samples? My experiments with 32 x 32 tiles suggest that this might be the case. What are your thoughts?"
While pure pixels are indeed crucial for remote sensing traditional classification approaches, deep learning models benefit from larger chips that encompass multiple classes to better learn distinctions between them. Given that we’re working with Landsat data at a 30 m resolution, larger tile sizes—like 400px x 400px, 448px x 448px, or 512px x 512px—will provide the model with better context for recognizing various land use/land cover classes. Using 32px x 32px tiles may hinder the model's learning and lead to misclassification, as it lacks exposure to multi-class in single image during training.