Train Deep Learning Model does not terminate

915
2
12-03-2021 05:56 AM
MarcusErz
New Contributor

Hey,

I used the Tool Export Training Data for Deep Learning to Export Training Data to Extract Building Footprints. That worked so far. Now I try to use the Train Deep Learning Tool. I have 14 Chips to train a Model. When I choose a Batch Size >14 I get this Error

epochs = int(np.ceil(num_it/len(learn.data.train_dl)))
ZeroDivisionError: division by zero

I guess cause the Batch Size is greater than the number of epochs. When I choose Batch Size = 1 I get this Error

Raise ValueError(capos;Expected more than 1 value per channel when training, got input size {}capos;.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch. Size([1, 256, 1, 1])

When I choose batch Size between 1 and 14 the Application doesent terminate. I waited for 20 Minutes and it still says running.... It never starts with Epoch 1When I try to stop the Tool it also doesn't work. It just says canceling. I waited for 5 Minutes but nothing is changing. Cause of that I stop Arc GIS Pro with the Task Manager. Does anyone know this problem and can help me? I dont find any related posts. Do I maybe have to change some of the Parameters. Here are the parameters that I am using:

MarcusErz_0-1638538810310.pngMarcusErz_1-1638538836413.pngMarcusErz_2-1638538862747.pngMarcusErz_3-1638538888943.png

This is how one of my Label Files looks like.

MarcusErz_4-1638538979079.png

I am using ArcGIS Pro 2.8.3. Now I will test if anythink is changig if I wait longer. But I think the Tool should be faster if I use only 14 Chips. 

I hope someone knows a Solution. Thanks in advance. 

 

0 Kudos
2 Replies
DanPatterson
MVP Esteemed Contributor

Moved to Imagery and Remote Sensing Questions so you get a chance on finding an answer


... sort of retired...
0 Kudos
RosalindaGuzmánCastillo
New Contributor

Did you solve this? I have the same error

0 Kudos