My train deep learning model won't quit after 2 and-a-half days. Which is a day longer than it said it would take to complete the task. Is it possible to stop the program and recover what it has done so far? Should I wait for it to finish? Should I cancel and start again? Something else?
Thanks, Mark
Did you try it with a smaller dataset to confirm the process?
Details on the input data type, location, extents and size would be useful as would anything about the destination parameters.
Yes. its big data and that's the point. I'm trying to find out how big the data can be. It's been cut in half once and it looks like another time is necessary.
In any event, is it possible to stop the program and recover what it has done so far?
Thanks
@MarkSchweder
Is it using the correct GPU?
If you click the Environments tab on the GP tool, click GPU for the Processor Type dropdown and for the GPU ID to zero.
I'm running ArcGIS Pro 3.5.4. On the parameters tab, under the Data Preparation dropdown for the Data Augmentation parameter, change the batch size to 16. Are you using an earlier version of ArcGIS Pro or is there any parameter for batch side?
Does these changes improve the performance?
I've had to reduce not only the complexity but also the quantity of data used to train the model to keep things moving. I'm going to try combining smaller models to regain the complexity and quantity.
If you get to the point where it starts writing data, it can take a long time (days) if you are reading and writing on the same disk. Ensure you are reading and writing results from/to different disks to avoid 'race' conditions.