Hello,
I am currently working on a research project to train a model to detect peaches on high-resolution imagery. Using the Train Deep Learning Model tool in ArcGIS Pro, you are only allowed to choose a percentage of your training data for your validation data. Since my training data is spatially autocorrelated, based on the existing literature, I would need to split the validation data from the training data in separate areas to avoid spatial leakage, where the validation set can contain chips that are adjacent to very similar training chips.
When I run the Train Deep Learning Model tool in ArcGIS Pro, the average precision is overestimated on the validation data.
Is there any way in ArcGIS Pro to choose a separate held-out area for validation instead of choosing a percentage of the training dataset?
We initially created a random tessellation of grids around the orchard and chose a random subset of those grids for digitization of the peaches. We then split the grids into 80% training and 20% testing. In an ideal experiment, I would want to label or select the grids for training, the ones that will be used for validation, and which ones will be used for testing.
Thank you for your time.
Sincerely,
Grisha Post