Train Deep Learning Model Tool and Spatial Autocorrelation?

GrishaPost · ‎06-04-2026

Hello,

I am currently working on a research project to train a model to detect peaches on high-resolution imagery. Using the Train Deep Learning Model tool in ArcGIS Pro, you are only allowed to choose a percentage of your training data for your validation data. Since my training data is spatially autocorrelated, based on the existing literature, I would need to split the validation data from the training data in separate areas to avoid spatial leakage, where the validation set can contain chips that are adjacent to very similar training chips.

When I run the Train Deep Learning Model tool in ArcGIS Pro, the average precision is overestimated on the validation data.

Is there any way in ArcGIS Pro to choose a separate held-out area for validation instead of choosing a percentage of the training dataset?

We initially created a random tessellation of grids around the orchard and chose a random subset of those grids for digitization of the peaches. We then split the grids into 80% training and 20% testing. In an ideal experiment, I would want to label or select the grids for training, the ones that will be used for validation, and which ones will be used for testing.

Thank you for your time.

Sincerely,

Grisha Post

PavanYadav · ‎06-10-2026

@GrishaPost

Your concern about spatial autocorrelation is valid, and in some workflows a spatially explicit split can be appropriate. However, the Train Deep Learning Model tool uses a single exported dataset and performs a random train/validation split to ensure consistency in schema, class definitions, metadata, and overall data distribution. Global image and label statistics in the EMD are also used to configure model-specific parameters (e.g., SSD settings such as zoom levels, aspect ratios, and grid sizes).

If the training and validation subsets are too spatially different, validation metrics may become less stable for guiding training, as they can reflect distribution shift rather than overfitting. For this reason, the validation split is intended to represent the same underlying distribution.

In many practical workflows, a large and diverse dataset combined with random splitting and data augmentation (which ArcGIS applies by default and can be controlled) is a solid and effective approach.

For evaluating true geographic generalization, a separate held-out spatial area evaluated after inference using Accuracy Assessment tools is typically more appropriate.

Here is how you can incorporated Test dataset in your workflow:

Reserve 20% of the labels strictly for testing and keep them out of the training workflow entirely.
Use the remaining 80% of the labels in the "Export Training Data for Deep Learning" tool to create a train dataset and train the model.
Once the model is trained, run "Detect Objects Using Deep Learning" over the unseen 20% test area.
Finally, pass those model predictions and the reserved 20% ground-truth labels into the "Compute Accuracy For Object Detection" tool to get a completely unbiased, leakage-free accuracy metric.

I hope this helps!

Cheers!

Pavan Yadav
Product Engineer at Esri
AI for Imagery
Connect with me on LinkedIn!
Contact Esri Support Services

View solution in original post

PavanYadav · ‎06-10-2026

@GrishaPost

Your concern about spatial autocorrelation is valid, and in some workflows a spatially explicit split can be appropriate. However, the Train Deep Learning Model tool uses a single exported dataset and performs a random train/validation split to ensure consistency in schema, class definitions, metadata, and overall data distribution. Global image and label statistics in the EMD are also used to configure model-specific parameters (e.g., SSD settings such as zoom levels, aspect ratios, and grid sizes).

If the training and validation subsets are too spatially different, validation metrics may become less stable for guiding training, as they can reflect distribution shift rather than overfitting. For this reason, the validation split is intended to represent the same underlying distribution.

In many practical workflows, a large and diverse dataset combined with random splitting and data augmentation (which ArcGIS applies by default and can be controlled) is a solid and effective approach.

For evaluating true geographic generalization, a separate held-out spatial area evaluated after inference using Accuracy Assessment tools is typically more appropriate.

Here is how you can incorporated Test dataset in your workflow:

Reserve 20% of the labels strictly for testing and keep them out of the training workflow entirely.
Use the remaining 80% of the labels in the "Export Training Data for Deep Learning" tool to create a train dataset and train the model.
Once the model is trained, run "Detect Objects Using Deep Learning" over the unseen 20% test area.
Finally, pass those model predictions and the reserved 20% ground-truth labels into the "Compute Accuracy For Object Detection" tool to get a completely unbiased, leakage-free accuracy metric.

I hope this helps!

Cheers!

Pavan Yadav
Product Engineer at Esri
AI for Imagery
Connect with me on LinkedIn!
Contact Esri Support Services

GrishaPost · ‎06-10-2026

Dear @PavanYadav ,

Thank you so much for your helpful response and your time!

Sincerely,

Grisha