Train deep learning model now running forever

157
2
4 weeks ago
Labels (1)
EdoardoForzano
New Contributor

Hi everyone, I have been producing some custom deep learning models for object detection. Everything worked and when I was using the "train deep learning model" when running it was showing me the percentage of completion and the different epochs with the average accuracy while running. Now it stopped doing that, no matter the label I prepare and the parameters I use it keeps running forever, without giving me any detail. I tried to use very small amount of data, that according to the past performance should not require much to be processed, but there is no way to make it work. Any suggestion? thanks

0 Kudos
2 Replies
DanPatterson
MVP Esteemed Contributor

Tech Support would be your best option, 


... sort of retired...
0 Kudos
PavanYadav
Esri Contributor

I understand you're using the Train Deep Learning tool. To see if the tool is functioning properly, check if you're using a CPU or GPU. In some cases, a fairly large amount of GPU memory is required.

To see if your GPU is being used, run the command nvidia-smi -l 5 and monitor GPU usage. Some models, especially those trained with large amounts of data per epoch, can take a long time (e.g., 1+ hours) to train. CPUs are much slower and may not be able to handle memory-intensive training.

nvidia-smi can help you see if your GPU is being used efficiently for the batch size you've set in the tool. For example, if your batch size is set to 4 and only a small portion of your GPU memory is being used, you can increase the batch size.

0 Kudos