Train Deep Learning Model fails with KeyError: '0'

702
1
Jump to solution
11-05-2021 12:22 PM
GLASCAMP
New Contributor II

 

Hi all,

I am trying to train and apply a Deep Learning model in ArcGis Pro 8.0. I installed all the required Python packages (now 289 packages installed and up to date).

I am working on WorldView-3 raster images (8 bands). I created a shapefile to delineate the objects I want the model to train on. There are about 50 objects with 3 classes (indicated as 0, 1, and 2 in a "code" field in the attribute table).

The model I am trying to use in the Mask R-CNN.

I used the Export Training Data for Deep Learning function in ArcGis to build my training dataset, selecting the RCNN Masks as Metadata format. The function works and exports image tiles as .tif format with the corresponding true label masks.

However, using this dataset in the Train Deep Learning Model leads to the following error (copied from ArcGis report):

 

Train Deep Learning Model
=====================
Parameters

Input Training Data     d:\Users\gcamp\Documents\OBJ_DETECTION\TRAIN_FILES_MASK_RCNN_ARCGIS
Output Model     d:\Users\gcamp\Documents\OBJ_DETECTION\MODEL
Max Epochs     20
Model Type     MASKRCNN
Batch Size     4
Model Arguments     chip_size 224
Learning Rate     
Backbone Model     RESNET50
Pre-trained Model     
Validation %     10
Stop when model stops improving     STOP_TRAINING
Output Model     
Freeze Model     FREEZE_MODEL
=====================
Messages

Start Time: vendredi 5 novembre 2021 16:06:32
Failed script (null)...
Traceback (most recent call last):
  File "d:\users\gcamp\appdata\local\programs\arcgis\pro\Resources\ArcToolbox\toolboxes\Image Analyst Tools.tbx\TrainDeepLearningModel.tool\tool.script.execute.py", line 297, in <module>
    execute()
  File "d:\users\gcamp\appdata\local\programs\arcgis\pro\Resources\ArcToolbox\toolboxes\Image Analyst Tools.tbx\TrainDeepLearningModel.tool\tool.script.execute.py", line 271, in execute
    callbacks=[ProgressCallback(training_model_object, model_type, max_epochs, out_folder)]
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\arcgis\learn\models\_arcgis_model.py", line 739, in fit
    lr = self.lr_find(allow_plot=False)
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\arcgis\learn\models\_arcgis_model.py", line 574, in lr_find
    raise e
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\arcgis\learn\models\_arcgis_model.py", line 571, in lr_find
    self.learn.lr_find()
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastai\train.py", line 41, in lr_find
    learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastai\basic_train.py", line 200, in fit
    fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastai\basic_train.py", line 99, in fit
    for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastprogress\fastprogress.py", line 47, in __iter__
    raise e
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastprogress\fastprogress.py", line 41, in __iter__
    for i,o in enumerate(self.gen):
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastai\basic_data.py", line 75, in __iter__
    for b in self.dl: yield self.proc_batch(b)
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
    data = self._next_data()
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\torch\utils\data\dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastai\data_block.py", line 651, in __getitem__
    if self.item is None: x,y = self.x[idxs],self.y[idxs]
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastai\data_block.py", line 120, in __getitem__
    if isinstance(idxs, Integral): return self.get(idxs)
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\fastai\vision\data.py", line 271, in get
    res = self.open(fn)
  File "d:\Users\gcamp\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\Lib\site-packages\arcgis\learn\models\_maskrcnn_utils.py", line 105, in open
    lbl_name = int(self.index_dir[self.inverse_class_mapping[fn[k].parent.name]])
KeyError: '0'

Failed to execute (TrainDeepLearningModel).
Failed at vendredi 5 novembre 2021 16:06:36 (Elapsed Time: 4,10 seconds)

 

I tried many combinations of input parameters (backbone model, etc.), but the error remains unchanged.

Since the error message is unclear, I cannot figure out what is wrong with the function.

Any help appreciated! 🙂

1 Solution

Accepted Solutions
GLASCAMP
New Contributor II

EDIT: I found the solution!

It appears that class ID must be different from 0 (zero).

I simply change class names from 0, 1, and 2 to 1, 2, and 3 and it solved the issue. The model now trains without any error.

Hope it will help!

View solution in original post

1 Reply
GLASCAMP
New Contributor II

EDIT: I found the solution!

It appears that class ID must be different from 0 (zero).

I simply change class names from 0, 1, and 2 to 1, 2, and 3 and it solved the issue. The model now trains without any error.

Hope it will help!