Colorizing Grayscale Imagery With Deep Learning - Workflow/Tips?

Snapshot36 · ‎12-02-2024

Hi all,

I'm trying to colorize some historical aerial imagery (1920s and 30s era ~1m GSD orthos from various parts of the U.S.) using the Deep Learning tools included with the Image Analyst extension. I'm using the basic workflow as outlined in this article, however the guide is pretty basic and doesn't go into detail about how to fine-tune your model to achieve better results:

https://www.esri.com/arcgis-blog/products/arcgis-pro/design-planning/colorizing-historic-black-and-w...

For the record, I'm an amateur when it comes to GIS, so if I get some terms wrong please go easy on me 😉 The end use will be adding the imagery to a flight simulator game (MSFS) so you can fly over cities and see what they looked like in the 1930s, which will be pretty cool!

My GPU is a 2070m with 8GB VRAM, so processing times have not been a big issue.

My results so far have been somewhat promising, but not yet usable. Generally, trees and fields are fairly reliably colored green, and some water areas are colored blue, but buildings rarely get any color definition and are sometimes erroneously given green/blue hues. The algorithm struggles with under- or over-exposed areas of the image; darker areas are typically treated like water and lighter areas aren't colored at all. Sometimes large swaths of urban areas are colored a mild green, sometimes forest areas aren't colored at all or are rendered dark.

Here has been my general workflow so far:

Find a good-quality modern ortho for the same area (mainly NAIP) and export a manageable-sized chunk at 1m resolution. I'm ensuring that the area I select contains all the same types of terrain features that are present in the historical image; cities, fields, forest, water bodies, etc.
Copy the exported image and convert to grayscale. I've generally been using the green channel, but have experimented with others, and adjusting brightness/contrast to better match the historical imagery, without noticeable changes.
Export Training Data using the method in the above article, except I've been increasing my Tile Size to 512 and using a Stride of 0 (the areas I've been exporting are large enough that I don't believe I need to overlap tiles to get more data). This has been resulting in training datasets of ~10,000 image pairs and ~4 GB.
Train Deep Learning Model using the method in above article. Setting Max Epochs to 50-100, but I also have "Stop when model stops improving" checked and it's been stopping usually between 20-40 epochs.
Classify Pixels Using Deep Learning to color the grayscale images as described in the article. I've also checked results on the grayscale image used for training, and as mentioned it reliably colors vegetation green and water blue but all other colors get neglected.

So what should I do to improve my model? Larger datasets? Force more epochs? Use promising models in the "use pre-trained model" function, and then do additional training using different source data?

If anyone has prior experience with this, any tips, tricks, or best practices would be greatly appreciated!

PavanYadav · ‎12-06-2024

hi @Snapshot36

I wrote that blog as a Proof of Concept. I trained the model over a weekend. The training data and the AOI where I applied the model are both in Southern California; and I admit that the final output image is good enough for a proof of concept but it's not great. My training data was fairly small (maybe 3000-4000 image chip pairs). My model also needs a lot more serious training with more training data if I want to apply it to more diverse landscapes. Because we don't have any other pre-trained models, you cannot really use it as a base model to fine-tune it. Your best bet is to increase the quantity and diversity of your data. Try to have the same pixel size in your training data that you would like to use for inference. Perhaps try retraining without early stopping.

Pavan Yadav
Product Engineer at Esri
AI for Imagery
Connect with me on LinkedIn!
Contact Esri Support Services

Snapshot36 · ‎12-06-2024

Hi Pavan,

Many thanks for your reply! Your recommendations are well-received. I've been having some success with ensuring the BW training data and target image have similar brightness/contrast (requiring manual editing of both). I'm now having very good success with colorizing vegetative areas, after training with 2-3 different datasets of 5-10k images at 50 epochs each. However, accurate coloring of neighborhoods and urban areas remains elusive. I think my next step will be to compile some larger datasets of solely urban areas and do several training sessions to see if that will produce good results.

Here are some examples of my results colorizing 1936 imagery of Seattle. As you can see, the uneven exposure across the orthomosaic isn't helping the results, but when zoomed in to mid-tone areas the results are quite nice:

Good results in this area with well-balanced exposure.A golf course in 1936: Forest, fields, and sand traps all nicely colored.Underexposed area: Some darker areas are mistaken for water and given bluish tones. Houses receive no colorOrthomosaic. Uneven exposure across photo frames doesn't help things, but in general, forests and fields are green.Overexposed area: Obvious issues. It's important to make sure target images are well-balanced before processing.