Dan:
I found this video on image segmentation using UNet to detect cell nuclei in images, which shows the basic principles of a modeling approach that I think could be adapted to extract building footprints from aerials. I was able to get the code to work after pip installing a few site packages (opencv-python, tensorflow and tensorflow-gpu) and some NVidia developer software for GPU acceleration (CUDA 10.0 and CUDNN 7.6.2.24).
The division of the training/test data into tiles or chips was not done by the video example code, so I still have to deal with developing my own routines for preparing training data from much larger rasters, since the arcgis.learn.export_training_data() method esri has created requires access rights I don't have. My starting data is similar to the esri sample, since I have a raster covering a much bigger extent than my area of interest, and a polygon layer of building footprints within my area of interest that optionally could be converted to a classified raster.
export = learn.export_training_data(input_raster=naip_input_layer,
output_location=samplefolder,
input_class_data=label_layer.url,
chip_format="PNG",
tile_size={"x":400,"y":400},
stride_size={"x":0,"y":0},
metadata_format="Classified_Tiles",
context={"startIndex": 0, "exportAllTiles": False, "cellSize": 2},
gis = gis)
The export_training_data method parameters suggest that this tool is very similar to the Split Raster tool. I don't have experience using the Split Raster tool either, but it looks like the main difference in the parameters seems to be the metadata_format that outputs Classified Tiles. I wish I could see a sample of the output of the export_training_data method that I could compare to the Split Raster output so that I could determine what, if any, additional processing is done beyond what the Split Raster tool does.
The video example seems to handle training and testing of the model fairly well, however, it does not deal with creating a final model output from a new raster and it seems best suited to processing separate photos that do not have to be reassembled into a single image at the end, so I would also have to figure out how to accomplish that. The esri example seems to have enclosed the final classification process in a black box method. I assume that method tiles the new raster and combines the tiles at the end to create a final classified raster covering the original raster extent.
Anyway, I would appreciate your thoughts on the assumptions I am making and any suggestions you may have that might help me create code or apply other techniques so that I could design a process of my own that might work for my needs.