Train Deep Learning Model Error for Feature Classifier

Shingo-Ikeda · ‎05-08-2021

Hi,

I am getting an error message while running Train Deep Learning Model.

ExecuteError: Traceback (most recent call last):
  File "c:\program files\arcgis\pro\Resources\ArcToolbox\toolboxes\Image Analyst Tools.tbx\TrainDeepLearningModel.tool\tool.script.execute.py", line 232, in <module>
    execute()
  File "c:\program files\arcgis\pro\Resources\ArcToolbox\toolboxes\Image Analyst Tools.tbx\TrainDeepLearningModel.tool\tool.script.execute.py", line 196, in execute
    training_model_object = training_model.from_model(pretrained_model_path, data_bunch)
  File "C:\Users\S0003051\AppData\Local\ESRI\conda\envs\dl-python\lib\site-packages\arcgis\learn\models\_classifier.py", line 360, in from_model
    return cls(data, **model_params, pretrained_path=str(model_file))
  File "C:\Users\S0003051\AppData\Local\ESRI\conda\envs\dl-python\lib\site-packages\arcgis\learn\models\_classifier.py", line 164, in __init__
    self.load(pretrained_path)
  File "C:\Users\S0003051\AppData\Local\ESRI\conda\envs\dl-python\lib\site-packages\arcgis\learn\models\_arcgis_model.py", line 1300, in load
    raise e
  File "C:\Users\S0003051\AppData\Local\ESRI\conda\envs\dl-python\lib\site-packages\arcgis\learn\models\_arcgis_model.py", line 1298, in load
    self.learn.load(name, purge=False)
  File "C:\Users\S0003051\AppData\Local\ESRI\conda\envs\dl-python\lib\site-packages\fastai\basic_train.py", line 281, in load
    get_model(self.model).load_state_dict(state, strict=strict)
  File "C:\Users\S0003051\AppData\Local\ESRI\conda\envs\dl-python\lib\site-packages\torch\nn\modules\module.py", line 830, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Sequential:
	size mismatch for 1.8.weight: copying a param with shape torch.Size([3, 512]) from checkpoint, the shape in current model is torch.Size([2, 512]).
	size mismatch for 1.8.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).

Failed to execute (TrainDeepLearningModel).

Process:

I have 13 TIF images and collected image labels into feature layers. These layers are used to export to image chips (TIF) into designated directories by original imagery naming scheme, and they are recursively trained individually using a pre-trained *.dlpk file from the previous *.dlpk file. The following code shows the steps that cause an error:

pretrained = "20200530"
current = "20200627"

chips = r"D:\Data\DeepLearning\Training\Proj_{}_TIF".format(current)
model = r"D:\Data\DeepLearning\Model\Proj_{}".format(current)
pre_model = r"D:\Data\DeepLearning\Models\Proj_{0}\Proj_{0}.dlpk".format(pretrained)

arcpy.ia.TrainDeepLearningModel(chips, model, 20, "FEATURE_CLASSIFIER", 2, "chip_size 256", None, "RESNET34", pre_model, 10, "STOP_TRAINING", "UNFREEZE_MODEL")

pretrained = current

This uses a feature classifier to train the image in order to classify trained labels. The initial training always works, but the secondary training (that uses pre-trained) fails. All training image chips are derived from a feature layer that has the same label schema from the same *.ecs file.

I appreciate it if there is a workaround to fix this issue.

Thanks.

Shingo Ikeda
Geospatial Data Scientist/Developer - Geographical Information Platform
Global Power Generation - Digital Satellite USA and Canada

Shingo-Ikeda · ‎05-18-2021

The solution that I implemented was to move time series TIF images and their corresponding training feature classes to gridded locations. I recreated a mosaic dataset from the newly relocated TIF and merged all the training feature classes into one feature class. This way, all classification schemas are visible in one image and I was able to export image chips to one output directory. In this case, spatial accuracy isn't the issue. As long as images are captured and exported as chips and labels, train deep learning works. Again, this is not a preferred solution since I had to break the coordinate locations to make it work, but with this approach, I could rotate the image for every 45 degrees and add more samples into the model by adding previously generated *.dlpk as a pre-trained model. This improved the model performance and I was able to run inference.

Shingo Ikeda
Geospatial Data Scientist/Developer - Geographical Information Platform
Global Power Generation - Digital Satellite USA and Canada

View solution in original post

Tim_McGinnes · ‎05-09-2021

The error points to an issue with the input data. Is it 13 tif files for each date, or 1 tif file for each of 13 dates? Do all the tif files have the same number of bands and bit depths?

How many label classes do you have? Does each image have the same label classes? Does each image contain examples of all the classes - eg: if your classes are blue and red, does each image actually contain at least 1 blue label and 1 red label? If you look at the .emd files in each training folder, are they all similar?

It's a bit hard to diagnose without seeing the actual input data.

Shingo-Ikeda · ‎05-10-2021

Thanks for replying. Each tif file has 4 bands and 16bit depth.

There are 18 classification classes in the scheme, but not all labels are available in each image hence this is construction monitoring for the specific site and the earlier image does not contain the later image such as completed construction objects. Therefore, emd file structures are similar, but the label content varies by the images.

I think this is the issue of transfer learning in pytorch where transferring weights from one model to another has a different output (number of classes) as discussed in the github here. I believe this is a critical function in geoai where not all classes are captured in one image for time series images since each consecutive image shows more features as time progresses.

The way I am exporting image chips are based on image, so that each training output folder has image chips from the same image. If I consolidate image chip output to a single directory, it will have a large pool of image chips from all images and would include all classes; however, if we are to do more real-time MLOp and collecting all schemas that would include future object that are yet to been seen from the satellite image, it is hard to keep train and use pre-trained image that has fewer classes.

Is there any option in the pytorch that I can bypass the error and accept mismatched classes among training sets? Can we switch to tensorflow and see if it takes mismatched classes?

Shingo Ikeda
Geospatial Data Scientist/Developer - Geographical Information Platform
Global Power Generation - Digital Satellite USA and Canada

Tim_McGinnes · ‎05-10-2021

The problem is that you’re not really using pytorch or tensorflow, but Esri’s arcgis.learn API on top of them. And while Esri have really made the entire end-to-end workflow quite easy it comes at the cost of flexibility. Anything out of the ordinary is not easy to do or just not possible.

I think you’ve already come up with the simple answers for staying with arcgis.learn - combine all the training data together for a single multi-classifier model or create multiple models, each of which classify one (or a few classes).

I think to do it they way you describe would require writing straight pytorch, tensorflow or fast.ai code.

Shingo-Ikeda · ‎05-10-2021

I think it is a good idea to have a recursive training method for image classification in order to enrich the deep learning package so that one operation can constantly add new training features; however, it also makes sense to centralize the training to a single datastore such as AWS S3 to append image chips. The only concern with this approach is that it is hard to retrain the specific image due to misclassification or to improve the classification. We may need to go back to the collection on the previous image and label it differently. If we separate the training set by input image, it is far easier to replace the specific image chips and generate a *.dlpk that can be used to re-train deep learning. If there is a best practice that I might have missed, please let me know.

Shingo Ikeda
Geospatial Data Scientist/Developer - Geographical Information Platform
Global Power Generation - Digital Satellite USA and Canada

Tim_McGinnes · ‎05-10-2021

For adding a new class into an image classifier, the current methodology seems to be to remove the final classifier layer in the model and replace it with a new layer with the new classes included. Then retrain the model with all the previous + new images. As you say, this can introduce misclassification, because the trained weights on that final layer are lost - I don't think there is really any way to keep them. There is also no guarantee the previously trained classes will retain the same accuracy they had before. There is a fair amount of research being done into the idea of continual learning, so someone may solve this in the future.

Shingo-Ikeda · ‎05-11-2021

Thanks for the explanation and confirmation of the shortcoming of the classifier. Although, it seems like this is just an issue with the iterator that cannot handle another image to continue train images. For example, when "Export Training Data For Deep Learning" exports chips and labels, the input (single) image is required even though the input feature class has an attribute field that has ImageURI. If the tool takes the path from the ImageURI and continues to export outputs, a merged training feature class can be used. Another implementation can be the use of time series Mosaic Dataset to be used for training and exporting. Currently, the use of a time series mosaic dataset does not respect attribute filter (such as time slider) and export training randomly takes a layer and the ImageURL only specifies the ImageURL path to the mosaic dataset root and does not specify which image in the mosaic dataset was used. I believe this feature is critical for continuous site monitoring and inferencing using deep learning.

Shingo Ikeda
Geospatial Data Scientist/Developer - Geographical Information Platform
Global Power Generation - Digital Satellite USA and Canada

Shingo-Ikeda · ‎05-18-2021

The solution that I implemented was to move time series TIF images and their corresponding training feature classes to gridded locations. I recreated a mosaic dataset from the newly relocated TIF and merged all the training feature classes into one feature class. This way, all classification schemas are visible in one image and I was able to export image chips to one output directory. In this case, spatial accuracy isn't the issue. As long as images are captured and exported as chips and labels, train deep learning works. Again, this is not a preferred solution since I had to break the coordinate locations to make it work, but with this approach, I could rotate the image for every 45 degrees and add more samples into the model by adding previously generated *.dlpk as a pre-trained model. This improved the model performance and I was able to run inference.

Shingo Ikeda
Geospatial Data Scientist/Developer - Geographical Information Platform
Global Power Generation - Digital Satellite USA and Canada