Interpreting AutoDL results

HollyTorpey_LSA · ‎08-26-2025

Hi all,

I recently ran AutoDL for all pixel classification models using training data I had created. It took almost four days. The results as they appeared in the leaderboard and the model metrics files were very promising, with accuracy in the 92-98% range for the top-performing models across my four land cover categories. However, when I use any of the output models to classify new imagery (same source and resolution, different nearby location, different date by a few months), my output is much less accurate and contains many "holes," or sporadic, irregular areas of no data in the output raster. Every model I try has these holes, but they occur in different places. One difference I should mention is that I ran the AutoDL tool using a single raster while I'm now trying to classify pixels in a mosaic dataset. Here's an area of no data:

I have tried several of the top performing models from my AutoDL result, I've tried retraining the output models with additional training data, and I've tried training a new model from scratch based on the AutoDL recommendations. I've experimented with changing the padding parameter, using test time augmentation and predicting background values. No real improvement.

If anyone has any suggestions specifically about the no data areas, I'd love to hear them. I had trained other models on my own before running AutoDL and I never had this outcome. I did re-export my training data before running AutoDL, so maybe I did something wrong during that step... if I had unclassified areas on my training data chips, could that result in this outcome? I'm thinking maybe my extent parameter was wrong when I exported the training data, resulting in some chips containing unclassified areas.

My other question is whether there is information somewhere in the AutoDL output about parameters. I only see models, loss, accuracy, dice, and learning rate. I was hoping to get some tips for optimizing my model parameters, but I'm not seeing that.

Finally, I thought the Advanced Mode would evaluate different backbones for the top two models, but that does not seem to have happened. My top two models were SemFPN and HRNet, but the optuna study was done on SamLoRA and UnetClassifier). Further, it seems like it tested the same architecture/backbone combination more than once, with different parameters maybe? The three numbers at the end of the folder name change each time, but I don't know what they indicate. If the numbers are related to parameters used, I'd love to be able to decode the folder names!

Thanks for any advice you can offer!

- Holly

SurajBaloni · ‎08-28-2025

Thanks for sharing the detailed use case! Based on your description, the “holes” in the classified raster don’t seem to be a typical accuracy issue. To help us investigate further, could you provide a sample subset of your dataset along with one of the trained models you used? That would allow us to reproduce the behavior and better understand why those no-data areas are appearing. You also mentioned retraining the top recommended model from scratch — can you confirm if you used the Train Deep Learning Model tool for this, and if so, which model did you try to train? Did the same holes appear in the output after retraining?

For the second part of your question:

When you run AutoDL, a readme.html file is created in the same output folder, and a link to it is also available just below the accuracy dashboard in the ArcGIS Pro tool's messages tab after the successful execution of the tool. That page provides details of the test runs. In the HTML dashboard, you’ll see columns for model, loss, accuracy, dice, learning rate, and an optuna_study flag. If optuna_study = True for a model, you can click on the model name to view the specific parameters and results for that run with its subset of images.

Regarding Advanced Mode: The does not evaluate MMSegmentation/Detection models (like SemFPN or HRNet) because these models don’t expose hyperparameters to tune. Instead, AutoDL selects the next top-performing non-MM model, which in your case were SamLoRA and UnetClassifier. That’s why the optuna study was performed on those models rather than SemFPN or HRNet.

As for the folder naming: the three numbers at the end are simply a timestamp and not related to parameters. The format is AutoDL_modelname_backbone_name_yy_mm_dd_hh_mm_ss. Optuna may train the same architecture multiple times with different hyperparameters (e.g., learning rate, class balancing, backbone etc.) in order to maximize performance. These details are captured in the readme.html, so that’s the best place to look for parameter variations in your runs.

HollyTorpey_LSA · ‎08-29-2025

Thanks for the response, @SurajBaloni! Your explanations of the report, the Advanced mode, and the file names are very helpful. I'd be happy to share a sample of my dataset and a model. How should I send it to you? And when you say dataset, do you mean my training data or the raster I want to classify?

Yes, I used the Train Deep Learning Model tool try to train the best performing model (SemFPN), but now that I look at my GP history, I didn't actually train it from scratch. I retrained the SemFPN model that was output by AutoDL. That resulting model also produced an output with lots of holes:

I now recall that I wanted to train SemFPN from scratch, but I'm not sure how to access that model architecture without retraining an existing SemFPN model. If I choose MMSegmentation from the Model Type dropdown, the "model" parameter has mask2former populated. If I change it to SemFPN or Sem_FPN, it doesn't accept it. I'm trying it as shown below (defaults) with new training data, but I have no idea what I'll get:

UPDATE: Just finally realized that if you hover over the red X, there's an error message that lists the possible model names:
"Error Invalid parameter. For MMSegmentation, model could be one of the supported model names:{'apcnet', 'mask2former', 'dmnet', 'prithvi100m', 'mobilenet_v2', 'fastscnn', 'ccnet', 'deeplabv3plus', 'hrnet', 'upernet', 'unet', 'nonlocal_net', 'ocrnet', 'deeplabv3', 'ann', 'pspnet', 'cgnet', 'fcn', 'psanet', 'emanet', 'gcnet', 'sem_fpn', 'dnlnet', 'resnest'} or could be path to the configuration file from MMSegmentation repositoryhttps://github.com/open-mmlab/mmsegmentation/tree/master/configs .For example, model:mask2former, model: C:\deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes.py .Remember to use double backslashes \ instead of single \ in the path."

So all I needed to do was make it lowercase. Trying that now!

Thanks again for your help!

- Holly

SurajBaloni · ‎09-01-2025

Thanks for clarifying, Holly!

If you can share both your training data and the raster for inferencing, that would make it easier for us to replicate the entire workflow and compare results on our side. If possible, please include the trained model as well. You can upload the data to OneDrive (or a similar service) and share the link, or share your email address so I can reach out directly.

Regarding the second part of your question, the model names in MMSegmentation are indeed case-sensitive in the Train Deep Learning Model tool and must match exactly as listed. Using sem_fpn should resolve the red X issue you were seeing.

It would also be very helpful if you could share your observations when training SemFPN from scratch and when using AutoDL. Any differences in performance or output quality will help us understand whether the AutoDL tool is under performing, or if the issue might be related to the training data.