I downloaded a set of sample Jupyter notebooks from esri at https://developers.arcgis.com/python/sample-notebooks/. One of the notebooks is called land_cover_classification_using_unet, which is supposed to showcase an end-to-end to land cover classification workflow using ArcGIS API for Python. The workflow consists of three major steps: (1) extract training data, (2) train a deep learning image segmentation model, (3) deploy the model for inference and create maps.
I am having trouble running the notebook, and so far have only gotten the first two steps to work, which just create a connection to ArcGIS Online. The third and fourth lines of code are supposed to access a labeled image to train the model, but I get an error that the index value is out of range no matter what index value I use, which basically means the image was not found.
label_layer = gis.content.search("Kent_county_full_label_land_cover")[1] # the index might change
label_layer
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-29-a4ac34d0306c> in <module>
----> 1 label_layer = gis.content.search("Kent_county_full_label_land_cover")[1] # the index might change
      2 label_layer
IndexError: list index out of rangeI downloaded the original classified image for Kent County in Deleware from the Chesapeake Conservancy land cover project. It looks the same, although I am not completely sure it matches the the extent or classifications of the training image the notebook was supposed to use.
How do I change the code to use the image I downloaded and saved on my computer rather than the image from ArcGIS Online?
I will probably will be asking more questions as I progress though the code, since it seems likely I will hit other problems. I am hoping to first be able to complete the notebook example covering the Deleware region and afterward adapt it to process the NAIP imagery for my jurisdiction.
I have been working with Esri support staff, including members of the Deep Learning team in Redlands. So far Esri staff has not been able to reproduce my problem. However, from my perspective that doesn't matter. All that matters to me is that I have not been able to reproduce Esri's successful use of the tool, and I am not going to wait for Esri to solve my problem.
So I have created my own version of the tool that does work for me. It outputs all of the image and label files I want, as well as the json, emd, map.txt and stats.txt files. Because I have full access to all of the internal behaviors of my tool I can adapt it to any workflow scenario I can imagine. My tool also is more forgiving than your tool, since it only has to work for me and I know my inputs meet your requirements without having to do any manual steps like setting the raster to thematic, especially since all of the tiles get output as Generic rasters by your tool anyway.
I hope I can get the Esri tool to work, in case it indicates some other problems with my installation, but at least now I can move beyond the data preparation stage and finally start doing some actual deep learning modeling.
 
					
				
		
I understand your problem and if you follow the steps I have suggested, most probably you will get the desired results. Let me know if you need any further help in exporting data or even in the next steps that follow, like training your model and inferencing results from your model.
The steps you suggested output nothing, because I cannot get it to output any tile with buildings ever. The Export tool does not work for me, even if it works for you and everyone else. I won't be using your tool any more now that I have built my own tool. I am not interested in your suggestions for your Export tool anymore, since that tool is just a waste of my time and I no longer have any use for it (at least when outputting Classified Tiles).
I am now exclusively focused to doing the model training. That is all I ever really cared about in the first place, not doing data preparation over and over. Deep Learning needs to start working for me and stop making me work for it.
I finally got the Export Training Data for Deep Learning tool to work using the Classified Tiles metadata output after Sanjeet Mathew at Esri support suggested changing the NODATA value of my aerial input raster from 255,255,255 to -1,-1,-1. With that setting the tool finally created the full set of 900 classified tiles I was expecting based on the tile size and stride size I specified.
A note should definitely be added to the Export Training Data for Deep Learning tool help about the various requirements of the tool for the Classified Tiles metadata output. It requires setting an impossible NODATA value of -1,-1,-1 for the input image and only using a label raster that has properties showing it is an 8-bit unsigned, thematic raster with a NODATA value that does not match any pixel of the label raster. Also if the label raster contains any NODATA pixels they must be reclassified as 0 using the Reclassify tool.
The output image tiles still look a little off on the barren land, but they look basically fine where the buildings exist, so I am not too concerned about that for the extraction of building footprints. I might be a little concerned if I was classifying all portions of the aerial with a land use classification.
The tool does perform faster than my custom tool, so I will use it. I will have to create a fishnet that matches the size and locations of the output classified tiles I intend to create to be able to avoid generating tiles that do not contain any buildings.
Sandeep:
I was able to train a model with the Classified Tiles my custom tool generated. I only used the 582 tiles created within the aerial image I have been showing in this post. However, I actually potentially have 1,498 more similar aerials that cover the 800K+ building polygons that could be used to generate training chips. Are there any guidelines I should follow for creating a training set that is large enough to use for ultimately extracting all of those buildings and any new future buildings that will show up in the 2020 aerial image I expect to have next year? Is there a rule of thumb for coming up with a percentage or sampling of my buildings that my training data should cover? Is there a maximum limit to the number of training chips I should use if I attempt to train the model using a single GPU with 8 GB VRAM that can only handle a batch size of 3 based on a chip size of 500x500 pixels? For now that is a limit I have to work within, which I expect can only allow me to work with a dataset that is only suitable for demonstrating the proof of concept of these techniques on my own data. Your comments may help me justify having my organization ultimately grant me access to Image Server and many more GPUs through a virtual machine.
 
					
				
		
It is difficult to estimate these things as it depends a lot of factors like the cell size you are training your model at and the learning rate. I would rather say make a fishnet and use only the grids with the best coverage of buildings and try to use a good learning rate. You should try to train the model with chip size from 256px - 400px it will allow you to pass more information per step, it is helpful some times. 8gb vram is okay and a batch size of 4-6 will be good if you lower the chip size a bit. Also, start from a small training set of let's say around 400 chips and then if you are not satisfied then you can keep on adding more data and check if the accuracy is improving. It will be a good learning exercise as these models do not behave identically to each geography or datatype, you need to experiment a bit to make it work for you.
The cell size is 0.5 feet height and width per pixel. That is the resolution of my aerial. The maximum area of my aerial I can work with from my Image Service is 15,000x15,000 pixel, because it has a restriction on the number of columns and rows I can download in order to change the NODATA values of my aerials. I was using 500x500 pixels for my chips, because it exactly divides my downloaded aerials into whole chips and at the size the chip will normally contain one or more whole buildings and bigger portions of large buildings.
Smaller size chips like 375x375 pixels (56.25% of a 500x500 pixel chip) or 300x300 pixels (36% of a 500x500 pixel chip) would also exactly divide my downloaded aerials into whole chips, but of course the number of whole buildings in each chip will be reduced and more chips will only contain partial buildings. However, if I create a fishnet I can also do an Identity with my building polygons. Then I should be able to statistically analyze those results so that the majority of chips I use for training contain whole buildings or all portions of large buildings that won't fit within a single chip.
I don't really understand what the graph of learning rates means or how different learning rate numbers will affect the model training. Can you provide a little more explanation of what adjustments to the learning rate does and what aspects of the learning rate graph I really need to focus on?
Yes the "Export Training data for deep learning" tool in ArcGIS Pro can be used to do that.
I will not be using Esri tools for deep learning as long as Esri only publishes examples that rely on Image Server, since I cannot and will not work in that environment. Telling me working outside of Image Server can be done does me no good without any examples or clear explanation showing how to actually do it. I called Esri help and they could not tell me how to adapt the code the Esri deep learning team provided in their notebook to work outside of Image Server, so I really need an example where Image Server is not used by the Export Training Data for Deep Learning tool.
I only had a trial license for Image Analyst that expired while I was trying to get help from Esri to show me how to actually use the Export Training Data for Deep Learning tool without Image Server. I believe My organization is still trying to get an Image Analyst license added to our Enterprise license, but I am frustrated that the deep learning team notebooks were unusable without Image Server access.
I do have an Advanced license, a Spatial Analyst license and a 3D Analyst license, so I can use the Eliminate tool, the Classify Pixel for Deep Learning tool, Majority Filter tool and the Regularize Building Footprint tool shown in the model builder diagram for the Building Footprint Extraction portion of the blog. However, I will not be getting access to Image Server, so I really need help making your notebook work without using Image Server.
The deep learning team really needs to lay out all of these license requirements in the notebooks up front more clearly, so that people don't waste their time trying them when they don't have the necessary licenses. And please provide an alternative option that doesn't involve Image Server if it is only highly recommended but not an absolute requirement.
The notebook really confused me since it caused me to read the online help for Image Server and not for ArcGIS Pro or Desktop, so I thought the tool only worked with Image Server. Since I have Spatial Analyst, I decided to just try the Export Training Data for Deep Learning tool on my own data in Desktop and to output it to a local directory. I found that my Image Service was too large and had download restrictions that caused a 999999 error, so I used the Clip tool to extract a GDB raster of a smaller portion. My Building Footprint feature class was also too large so I selected footprints that overlapped the image I had clipped and exported them. I made sure that there were 5 fields that matched the Image Classification Manager fields added to my building footprint polygons (Classname - text 256 char, Classvalue - Long, RED - Long, GREEN - Long and BLUE - Long) and populated them. That finally worked to output PNG files to a local directory and KITTI_rectangles metadata.
For the benefit of anyone like me that wants a real life example of what the tool produces rather than just the description given in the tool help, the output created an images directory and a labels directory and a stats.txt file.  There were 644 PNG files in the images directory based on the number of pixels and stride I specified and the number of images that contained a polygon, and there were 644 text files in the labels directory all with numeric file names padded with leading zeros (ie., 000000000.png and 000000000.txt respectively)  A sample image and label text file are shown below:

Image output to the images directory image shown with building polygons (not part of output)
labels text file:
1 0.00 0 0 0.00 433.91 24.63 507.57 0 0 0 0 0 0 0
 1 0.00 0 0 33.03 497.67 82.45 512.00 0 0 0 0 0 0 0
 1 0.00 0 0 85.77 384.83 198.12 512.00 0 0 0 0 0 0 0
 1 0.00 0 0 408.83 386.81 512.00 506.69 0 0 0 0 0 0 0
 1 0.00 0 0 388.53 195.90 502.51 290.04 0 0 0 0 0 0 0
 1 0.00 0 0 409.18 0.00 512.00 18.65 0 0 0 0 0 0 0
The tool says the first position in each line of the text file is the classification code, the next three are skipped, the next four are image coordinates that define the minimum bounding rectangle of the polygon and the rest of the positions are skipped. The minimum bounding rectangle defines separate training chips within the image that will be used by the deep learning classifier for each building.
The stats.txt file summarized the output of the tool as follows:
images = 644 *3*512*512
features = 4539
features per image = [min = 1, mean = 7.05, max = 14]
classes = 1
cls name cls value images features min size mean size max size
Buildings 1 644 4539 0.02 1978.09 6068.74
Hi Richard, Richard Fairhurst
I am currently exploring deep learning within arcpro also. I have managed to get a few models to work but i am, like you, looking to fine tune them to make them actually useful at detecting objects. I am doing this as part of R&D within my company so its purely experimental at this point.
If you would like to chat/ share whats worked ect. you can reach me on ty-hayward@environment-agency.gov.uk.
