Hey, so we are students trying to figure out how to do Deep Learning stuff through ArcGIS Pro and we have some questions before we commit to our workflow. I've gone down the rabbit hole for awhile and there are a few things I just cannot find explanations for.
1. We are using some raster tiff satellite imagery that has pyramids that clearly change the pixel resolution when you zoom in and out. We are planning to do object localization with airplanes, where we will draw polygons around airplanes. Does it matter what scale we are at when we export the training data (creating image chips) if the rasterization changes at different scales?
2. How do we combine training data sets? Let's say we have like a dozen images we are making training samples from, and we have to export our training samples for each image, creating a separate image chip directory each time. What is the best method for combining all of these image chips into a single directory that can be used once for the training of the deep learning model?
I am aware that we can add a pre-existing .dlpk file in the Train Deep Learning Model tool, but this seems very inefficient to do multiple times. Also we can't just manually copy and paste the subfiles into the same directories because of their numbering schemes and whatnot. There must be a better way!
3. Also I just want to confirm if the processes in ArcGIS Pro Imagery Analyst toolset automatically augment the training data (rotate, flip, translate, etc.). I assume that this is why it forces us to choose a neural network (like ResNet) during the training of the model, but I have not seen 100% confirmation on this.
1. As far as I know, it exports the imagery at the actual source resolution and does not use pyramids. It is easy enough to check, just load in one of the image patches and check the cell size.
2. I haven't used it myself but the path variable for arcgis.learn.prepare_data says it can take multiple folders:
Path to data directory. Provide list of paths for Multi-folder training. Note: list of paths is currently supported by for dataset types: Classified_Tiles, Labeled_Tiles, MultiLabel_Tiles PASCAL_VOC_rectangles, RCNN_Masks
3. There are a set of default transforms used by esri but I have not been able to find them documented anywhere. Stated on this page:
By default, prepare_data() uses a default set of transforms for data augmentation that work well for satellite imagery. These transforms randomly rotate, scale and flip the images so the model sees a different image each time. Alternatively, users can compose their own transforms using fast.ai transforms for the specific data augmentations they wish to perform.
You will have to dig into the code to find out. The prepare_data function is in C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\site-packages\arcgis\learn\_data.py
As an example, for MaskRCNN, this is in the code:
if dataset_type == 'RCNN_Masks': .... if transforms is None: ranges = (0, 1) if _image_space_used == _map_space: train_tfms = [ crop(size=chip_size, p=1., row_pct=ranges, col_pct=ranges), dihedral_affine(), brightness(change=(0.4, 0.6)), contrast(scale=(1.0, 1.5)), rand_zoom(scale=(1.0, 1.2)) ] else: train_tfms = [ crop(size=chip_size, p=1., row_pct=ranges, col_pct=ranges), brightness(change=(0.4, 0.6)), contrast(scale=(1.0, 1.5)), rand_zoom(scale=(1.0, 1.2)) ] val_tfms = [crop(size=chip_size, p=1., row_pct=0.5, col_pct=0.5)] transforms = (train_tfms, val_tfms) kwargs_transforms['size'] = chip_size
You will have to use python\jupyter notebook with arcgis.learn if you want to use more advanced options.
Actually I may attempt using Jupyter Notebook terminal inside of ArcGIS Pro as alternative to the default Train Deep Learning Model script-tool, after reading some articles on it.
Project is airplane detection using standard satellite optical imagery. I have been able to successfully complete all steps including detect objects, but no objects appear in the final shapefile. I suspect that this is due to something not working out in the training step, so I will have to get my hands dirty and go under the hood if I want results.
- Image Chips are 448x448, JPEG, using Pascal.
- Have been trying to train using Single Shot Detector, ResNet-101.
- I have an RTX 3080, so GPU processing seems to work ok at 8-16 batch size. Problem is it takes like half an hour any time I try to change and test configurations.
- We are trying our best to work from tutorials and articles found on the Internet.
The Detect Objects tool has a high default threshhold - I think it's 0.9. That means it will only export objects with 90%+ confidence. You could try lowering that to see if you get any results.
The ssd.show_results command can be used in your notebook to visually see results from the validation set and it has a threshhold parameter you can play around with to get a feel as to where to set it.
How long is it taking for each epoch? We generally train in sets of 10 epochs and make sure that it is getting some results back. We also have a separate test set of imagery with identified known objects and during training do a detect objects step and see how many of the objects were detected vs not detected and how many false objects were detected. We were quite surprised how little training some models need to get very good results.
The FasterRCNN and YOLOv3 models are also available for object detection and may give better results. You only need to change a few lines in your notebook to use these instead.
I have the idea of how to do the notebook stuff, but now ArcGIS Pro has decided to crash every time I try to add a notebook. I'm running ArcGIS Pro v2.7, and Anaconda v3, both are 64-bit. I have no idea why it is doing this. I can create the whole notebook in Chrome for training the model, but cannot import it into ArcGIS.