Help with deep learning and pixel classification of Sentinel 2

07-21-2021 05:34 AM
Labels (1)
MVP Frequent Contributor

Hi, I've been trying to teach myself the workflow of using the deep learning tools in ArcPro 2.8. I have installed the deep learning libraries for 2.8 as instructed on the GitHub site here. My set up is ArcPro 2.8.1 and my PC is a modern i7, 32gb computer with an nvidia 3070 RTX graphics cards.

I decided the best way to learn was to replicate a task I had done recently but using the deep learning tools. I have a Sentinel 2 image and I want deep learning to identify bare/ploughed fields in a small area of the UK.  Originally I thought this was object detection and followed the workflows described in the ArcPro help file. ArcPro was crashing and reporting errors so I almost gave up. But having read this page I realised object detection was about putting those rectangles around what it thinks it has found. I want the actual boundary of the field identified and have realised it is pixel classification I need to be doing.

Either the process inconsistently fails with an error (something about a bad token) or it simply creates a blank raster when I run the Classify pixels using Deep learning tool. I typically accept all the default settings as I don't know any better! I find setting the tools to use a GPU runs longer\errors\crashes than if I set it to CPU.  This thread hints to the fact that these deep learning libraries are not compatible with an RTX 3070 GPU?

So let me talk you through my steps, may be you will spot the many "school boy errors" I am making? I'm a complete novice to this branch of image analysis and fully accept I'm doing something daft!

  1. I have read the thread here and have as a first step converted the 16 bit Sentinel 2 image to a 3 band 8 bit unsigned raster.
  2. I select the raster in the TOC, and use the labels objects for deep learning to draw around 70 polygons, as you can see they have a class value of 1. I only draw polygons around what I think are bare fields. Is this too little or inappropriate for pixel classification?DuncanHornby_0-1626865564396.png
  3. I export the training data as TIFF and set the metadata to Classified Tiles as shown below. This tool runs fine. The classified tiles allows me to select the U-Net classifier in the next processing step.DuncanHornby_1-1626865821031.png
  4. I run the train deep learning tool with the environment settings set to CPU and 100% parallel processing. The main dialog interface is left as is:

  5. With regards to the Model Arguments, we see 5 arguments have been supplied as a default for U-Net but when I look at the Help in arcgis.learn module chip_size is not an argument, in fact its not mentioned anywhere on that page for the UnetClassifier. I have come to the conclusion that arguments it offers up are not always appropriate, seems like a tool interface bug, am I correct?
  6. The message dialog reports this (I will be honest, this is all meaningless to me as I don't yet fully understand all the nuisances of the tool). But I'm OK with that as I am just trying to learn and get anything out of it!

    Start Time: 21 July 2021 12:12:34
    Learning Rate - slice(5.754399373371565e-05, 0.0005754399373371565, None)
    epoch training loss validation loss accuracy Dice
    0 0.08495312929153442 0.12163998186588287 0.9776098132133484 0.0
    1 0.051144104450941086 0.10854105651378632 0.9776098132133484 0.0
    2 0.03989902138710022 0.0694444552063942 0.9776037931442261 0.0
    3 0.03854808583855629 0.06977503001689911 0.9776098132133484 0.0
    4 0.04184050112962723 0.0712885931134224 0.977604866027832 0.0
    5 0.03452085331082344 0.06181521341204643 0.9776098132133484 0.0
    6 0.035225775092840195 0.06532987207174301 0.9776098132133484 0.0
    7 0.03405182063579559 0.061748944222927094 0.9776098132133484 0.0
    8 0.0513390377163887 0.07498864829540253 0.9776098132133484 0.0
    9 0.03635616600513458 0.06778693199157715 0.9776098132133484 0.0
    10 0.035765890032052994 0.05881249159574509 0.9776098132133484 0.0
    11 0.03397132828831673 0.059320658445358276 0.9776098132133484 0.0
    12 0.04003984481096268 0.06356815993785858 0.9776098132133484 0.0
    13 0.03734290599822998 0.06060848757624626 0.9776098132133484 0.0
    14 0.033427681773900986 0.05860653519630432 0.9776098132133484 0.0
    15 0.03398134186863899 0.0582389235496521 0.9776098132133484 0.0
    16 0.03260520100593567 0.05759035423398018 0.9776098132133484 0.0
    17 0.033332113176584244 0.05730770155787468 0.9776098132133484 0.0
    18 0.03224635496735573 0.05699590593576431 0.9776098132133484 0.0
    19 0.032304972410202026 0.05687981843948364 0.9776098132133484 0.0
    {'accuracy': '9.7761e-01'}
    NoData Bare Field
    precision 0.977610 0.0
    recall 1.000000 0.0
    f1 0.988678 0.0
    Succeeded at 21 July 2021 12:59:56 (Elapsed Time: 47 minutes 21 seconds)
  7. I finally run the Classify Pixels using deep learning tool, I set to use CPU and I limit the extent of processing, the tool is set up as:DuncanHornby_3-1626869154406.png


  8.  The output is a blank raster:DuncanHornby_5-1626870592196.png


So I think I'm doing the right sequence Prepare raster> create training samples > export > train > detect and I think I'm using the correct type of classification (pixel classification not object detection) but as you can see nothing works!

If anyone has any advice I am desperate to hear from you, even if I have done a dumb thing that any hardened deep learner would intuitively know.


0 Kudos
11 Replies
Occasional Contributor III

Yes, there are known issues with the RTX 3xxx series of cards which are most likely the cause of your problems. 

0 Kudos
MVP Frequent Contributor


On your advise I have added to the list of ever increasing RTX users over on the GitHub website issue tracker page.

So... Dumb question? If I explicitly use the CPU in the environment setting should these problems go away as I'm not using the GPU? Or are the deep learning libraries using this CUDA 10.1 anyway just not making use of the GPU?

Also having reviewed my question was I doing the right things in the right order and if the RTX issue was not an issue you would have expected to see bare fields being picked out in the classification?

Finally what's your opinion on #5  about the arguments?

0 Kudos
Occasional Contributor III

Normally when you train a model via the Python API or a notebook first you use the prepare_data function. One of the parameters in it is the chip_size. Because the geoprocessing tool covers all the training steps, maybe it is pulling in the chip_size argument as well? There's no real way to tell.

For CPU working even if the GPU doesn't for your card - I really don't know. But we should be able to test one way or the other.

In regards to your sequence and inputs, everything generally looks ok. I will do another reply with an example.

Occasional Contributor III

I haven't really done much pixel classification so decided to do a model similar to yours to try it out. I started with Sentinel2 imagery, and extracted just the RGB bands into a new tif file. Created some bare earth polygons to train with and exported the training data. I used classvalue 1 and classname earth.


I prefer to train using a notebook, because you get a lot more control and can see what is happening throughout the process. In Pro, just go to Insert in the ribbon and choose New Notebook. First step - import modules and setup training data. Then use the show_batch function to check your training data. What you want to see here is that your polygons are showing up on the images, like the one on the right.


Next is to setup the model and find the learning rate.


I just chose 1e-04 (or 0.0001 as the learning rate). Next start training the model, I just chose 10 epochs to start with (ignore the 50 here, I will explain why in a minute).


Now use the show_results function to see how the model is training. On the left is the ground truth from your training data, on the right is the current results from the model. You can see the model is starting to recognise the bare earth. After 10 epochs I couldn't see any results, so I trained it for 50 more epochs with the results below. In a notebook it's easy because you can just go back up to the previous cell, change the number of epochs and run it again (it will continue training from where it left off). Obviously the results are still not great, but you can see it is actually working.


Next step is to save the model to disk.


Now I went ahead and used the Classify Pixels tool on the same image. You can see that it is actually detecting the bare earth in purple with a pixel value of 1. So everything is working, the model just needs to get better.


See if you can follow the above just using your CPU and if you get any results. You may need to force the notebook to use the CPU with the instructions from here: How force Pytorch to use CPU instead of GPU? 

MVP Frequent Contributor


Really appreciate your time in helping me.

I have been shying away from notebook, not because I can't use python but because I am very much a fish out of water with deep learning. So I took the approach if I can just get it to work in the tools (and ESRI do good tool interfaces) I might have a chance at understanding it. I've spent so long bashing my head against the tools that unbelievable some of it has sunk in so reading your notebook approach I understood! I'm going to go away and have a tinker and will report back.

I recently came across Google Colab a notebook python environment. If I can get your notebook working with just a CPU to avoid all the issues with my RTX card I was wondering if the logic could be migrated into the colab environment as I understand that they offer up a GPU. Anyway 1 step at a time and I hope ESRI resolve the incompatibility with the RTX cards soon.

0 Kudos
MVP Frequent Contributor


So I was able to follow your notebook instructions and turn off CUDA to force only CPU and then complete the training. Good news it did not crash or return an error but it did take many hours to process and eventually failed to classify anything. I then clipped back the original Sentinel 2 raster so it was not so big, created a load of polygons in this smaller raster and when I exported the data I reduced the tile size to 128 and stride to 32. Went through the training as before and when I finally did the classification...nothing was identified! 

So I studied your notebook book again and the only obvious difference between your inputs and mine are that you had drawn rectangles within bare fields whilst I had drawn around the edge. So I rebuilt my training sample as rectangles in bits of bare field, went through the whole training section, bumped the number of epochs from 20 to 30 when you call the function and finally when I run the classify pixels tool I see something (yellow pixels) as shown below!



Why would drawing crude rectangles in only parts of fields seemingly work better than defining the actual edge?

Also what would you now do to improve this workflow so that it better identifies bare fields because the results are currently  quite poor. Do I need to draw lots more training rectangles, increase the number of epochs? Are there other tweaks you know?

0 Kudos
Occasional Contributor III

That's good news Duncan, you got the easy part out of the way! So it looks like CPU does work for the RTX3xxx cards, at least for Unet and Pixel Classification.

There should not be any difference for the rectangles vs full polygons - maybe they just needed more training to work?

You have already identified the 2 most practical ways to improve results:

  • Train the model for longer. There will come a point where the model stops improving and it could actually get worse. Generally the loss numbers for each epoch should get closer to zero the more you train. But what can actually happen is that the model is just getting better specifically on your training data and cannot generalise on other images (known as overfitting). My recommendation is to train the model in increments (you can chose the number of epochs, say 50\100) and then save the model and test it on both your original image and a completely different sentinel2 image to see the actual accuracy.
  • Increase the amount of training data. More training data is definitely better, but can take a long time to collect (in your case it would be easy to draw more rectangles though). The downside to having more training data is that you end up with hundreds or thousands more training images and it will take much longer to train each epoch. You should also consider using multiple Sentinel2 images to capture training data. The easiest way is to add them to a mosaic dataset, but still have a single feature class. Then just use the mosaic to export the training data.
  • You could also try a different model type - ArcGIS support both DeepLab and PSPNet for pixel classification. I think they both use the same Classified Tiles training data format, so it should be easy to train them and compare the results to Unet.
  • You could also try training the model using the full set of Sentinel2 multispectral bands - it does work and there are some tips on this page: Working with Multispectral Data 

One final tip. If you have saved a model then come back later and want to continue training it, then you should choose the saved model in the Advanced\Pre-Trained Model parameter in the Training geoprocessing tool.

If using a notebook rather than:


unet = UnetClassifier(data)


You would use:


unet = UnetClassifier.from_model(r'<path_to_model_emd_file>',data)


And the training data doesn't need to be the same either - so you can take your current model and just keep training it with other data (so your training time was not wasted).

New Contributor III


For pixel Classification , When trying to train with sparse data , that is training data doesnot cover the entire image , for better results you can set the ignore_class parameter in the train tool to 0. This ignore all the pixels that have not been collected as training samples. 


The chip size is the size that is used for clipping the image for training. The default is 224 and the python api uses the same. 




0 Kudos
MVP Frequent Contributor


I was quite excited by your additional bit of information but when I include it in the parameter list during the train model part of my notebook code as shown below:


unet = UnetClassifier(data, backbone='resnet34', ignore_classes=[0])



I get this response...

Exception                                 Traceback (most recent call last)
In  [4]:
Line 1:     unet = UnetClassifier(data, backbone='resnet34', ignore_classes=[0])
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\learn\models\, in __init__:
Line 124:   raise Exception(f"`ignore_classes` parameter can only be used when the dataset has more than 2 classes.")
Exception: `ignore_classes` parameter can only be used when the dataset has more than 2 classes.


So I'm guessing your sparse data actually had more than one class?

0 Kudos