Error: Could not find 'NumPixelsPerClass' in step Train Deep Learning Model

920
5
Jump to solution
05-27-2021 01:35 PM
NayaraV
New Contributor II

Hi, I received this error when I tried to Train Deep Learning Model. (Arcgis Pro 2.7)
I created the label with "Label Objects for Deep Learning" with 1 class. I exported the training with metadata as "Classified Tiles".
My Error Message:
Could not find 'NumPixelsPerClass' in 'esri_accumulated_stats.json'. Ignoring `class_balancing` parameter.
Traceback (most recent call last):
File "c:\program files\arcgis\pro\Resources\ArcToolbox\toolboxes\Image Analyst Tools.tbx\TrainDeepLearningModel.tool\tool.script.execute.py", line 232, in <module>
execute()
File "c:\program files\arcgis\pro\Resources\ArcToolbox\toolboxes\Image Analyst Tools.tbx\TrainDeepLearningModel.tool\tool.script.execute.py", line 207, in execute
show_accuracy=show_accuracy)]
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\arcgis\learn\models\_arcgis_model.py", line 708, in fit
self.learn.fit_one_cycle(epochs, lr, callbacks=callbacks, **kwargs)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\fastai\train.py", line 23, in fit_one_cycle
learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\fastai\basic_train.py", line 200, in fit
fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\fastai\basic_train.py", line 101, in fit
loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\fastai\basic_train.py", line 34, in loss_batch
if not skip_bwd: loss.backward()
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\torch\tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\torch\autograd\__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 4.00 GiB total capacity; 2.72 GiB already allocated; 60.95 MiB free; 2.89 GiB reserved in total by PyTorch)
Ocorreu um erro ao executar (TrainDeepLearningModel)

Please, anyone could help me? I've been working for days trying to fix it
Kindly
Nayara

0 Kudos
1 Solution

Accepted Solutions
DrVSSKiran
Occasional Contributor II

Hi,

Try to reduce the tile size during the exporting and batch size during the training. 

For example:

If you are working on tile size 448 as mentioned in below figure try to reduce it 50%

DrVSSKiran_0-1622186364354.png

Secondly, at the time of training the dataset please reduce the batch size is 1 if you worked on 2 or 4 as mentioned the below figure.

DrVSSKiran_1-1622186519882.png

If still, you will get the same error, kindly check the GPU usage and assign complete GPU for ArcGIS Pro. and Check the python libraries, uninstalled the fast.ai and install the fast.ai and try.

Good Luck.

Thanks

 

View solution in original post

0 Kudos
5 Replies
DanPatterson
MVP Esteemed Contributor

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 4.00 GiB total capacity; 2.72 GiB already allocated; 60.95 MiB free; 2.89 GiB reserved in total by PyTorch)

 

Did you try on a smaller area... seems you are running out of memory


... sort of retired...
0 Kudos
NayaraV
New Contributor II

Do you mean in this case to use a model not so deeper, for example, use Resnet50 instead Resnet34 for example? 

I checked here, and I have 4GB memory dedicate to Geforce and 7.9GB shared with IntelR 630, Please, if you could help me to improve or set for the best way to run my data, I really would appreciate it.

0 Kudos
NayaraV
New Contributor II

When I change the GPU I got this message, I think it's the same as the previous

Traceback (most recent call last):
File "c:\program files\arcgis\pro\Resources\ArcToolbox\toolboxes\Image Analyst Tools.tbx\TrainDeepLearningModel.tool\tool.script.execute.py", line 23, in <module>
torch.cuda.set_device(arcpy.env.gpuId)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\fastai\torch_core.py", line 72, in _new_torch_cuda_set_device
_old_torch_cuda_set_device(device)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\deeplearning\lib\site-packages\torch\cuda\__init__.py", line 292, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (101) : invalid device ordinal at ..\torch\csrc\cuda\Module.cpp:59
Ocorreu um erro ao executar (TrainDeepLearningModel).

0 Kudos
DrVSSKiran
Occasional Contributor II

Hi,

Try to reduce the tile size during the exporting and batch size during the training. 

For example:

If you are working on tile size 448 as mentioned in below figure try to reduce it 50%

DrVSSKiran_0-1622186364354.png

Secondly, at the time of training the dataset please reduce the batch size is 1 if you worked on 2 or 4 as mentioned the below figure.

DrVSSKiran_1-1622186519882.png

If still, you will get the same error, kindly check the GPU usage and assign complete GPU for ArcGIS Pro. and Check the python libraries, uninstalled the fast.ai and install the fast.ai and try.

Good Luck.

Thanks

 

0 Kudos
NayaraV
New Contributor II

Thank you DrVSSKiran,

I was having problems to login in my account to answer you.
Actually when you gave your advice and reccomendations I was formating my laptop and reinstalling the Arcgis and frameworks. But the problem countinued the same. So, I did what you suggesting (reducing the tiles size) and now it's running "normal" and I could did several tests. 


Thank you so much 😃