Select to view content in your preferred language

Train deep learning model won't quit

166
7
Sunday
MarkSchweder
Emerging Contributor

Screenshot 2025-10-12 190702.png

My train deep learning model won't quit after 2 and-a-half days. Which is a day longer than it said it would take to complete the task. Is it possible to stop the program and recover what it has done so far? Should I wait for it to finish? Should I cancel and start again? Something else?

Thanks, Mark

0 Kudos
7 Replies
DanPatterson
MVP Esteemed Contributor

Did you try it with a smaller dataset to confirm the process?

Details on the input data type, location, extents and size would be useful as would anything about the destination parameters.

 


... sort of retired...
0 Kudos
MarkSchweder
Emerging Contributor

Yes. its big data and that's the point. I'm trying to find out how big the data can be. It's been cut in half once and it looks like another time is necessary.

In any event, is it possible to stop the program and recover what it has done so far?

Thanks

0 Kudos
RTPL_AU
Honored Contributor

@MarkSchweder 
Is it using the correct GPU?

0 Kudos
MarkSchweder
Emerging Contributor

yes

0 Kudos
Robert_LeClair
Esri Esteemed Contributor

If you click the Environments tab on the GP tool, click GPU for the Processor Type dropdown and for the GPU ID to zero. 

I'm running ArcGIS Pro 3.5.4.  On the parameters tab, under the Data Preparation dropdown for the Data Augmentation parameter, change the batch size to 16.  Are you using an earlier version of ArcGIS Pro or is there any parameter for batch side?

Does these changes improve the performance?

0 Kudos
MarkSchweder
Emerging Contributor

I've had to reduce not only the complexity but also the quantity of data used to train the model to keep things moving. I'm going to try combining smaller models to regain the complexity and quantity.

0 Kudos
RichardDaniels
Honored Contributor

If you get to the point where it starts writing data, it can take a long time (days) if you are reading and writing on the same disk. Ensure you are reading and writing results from/to different disks to avoid 'race' conditions. 

0 Kudos