Mosaic Datasets, Imagery Storage, and Analysis...

JM32 · ‎01-06-2023

Hi Everyone,

Looking for advice on imagery management and working with Mosaic Datasets. To start, here is a bit of background: I work for a local government and manage imagery from various vendors (primarily tiled orthoimagery). Formats are dominated by full resolution GeoTIFFs and compressed MrSIDs, and temporally range multiple years.

My management question is:

Does using the Windows Zip tool to compress a folder into a .zip file degrade image data in any way when storing it as a backup? From what I understand, zipping file folders uses lossless compression, but I can't say I'm an expert at this and am not sure how to tell if say 50 years from now, someone wants to unzip the data for use, will be the true original GeoTIFF data that was in there from the vendor? [Any insight on how others are storing backup data is welcome...currently I run with having working data for mosaic datasets on one server, and backup data on another, with a third external drive storing another backup copy]

Mosaic Dataset (MD) questions:

When you zoom in close on a tile in a MD, are you actually viewing the full resolution image tile stored in the image data source directory or is it just the pyramid built for that image?
The "Download Rasters" functionally of a MD, does that allow you to download, or make an exact copy (no resampling or moving over pixels), of the original image tile you selected on the MD? Is the image data altered in any way when using this?

Analysis question:

My end goal with the imagery I manage is to use it to build data layers for our region, mainly land cover classifications in ArcGIS Pro. I have a good amount of experience doing so in other software packages, with some in ArcGIS Pro, but what I'm not sure on deals with using MD to do the analysis. In the past, I've always just used an actual image tile from a vendor...or exported a custom image for an area of interest. However, this seems excessive or not possible for very large extents. So, my question is:

When using a MD referencing GeoTIFFs, say I want to do an image segmentation for a small area within a city. Can I preform this operation on the MD, and will it be performed on the actual full resolution GeoTIFF or on it's pyramids and/or overviews? Additionally, does having the transmission setting under "Allowed Compression Methods" set to anything other than "None" alter the data I would be running my analysis on? (since all my MD are using JPEG on this setting to display a bit faster for basemap purposes). Ideally, I want to work with the true pixel values from the raw imagery the MD are referencing, since those values will guide all of the analysis work and I'd like to take advantage of the data available.

Thank you in advance, for any and all insights!

Jon

PeterBecker · ‎01-07-2023

There is very little point compressing imagery using ZIP. I recommend against using ZIP to compress a directory of images. If you are looking for lossless compression of imagery then simplest is to use Deflate or LZW. There are some better lossless compression methods, but the differences will be limited. If you have collection of imagery that are not compressed and you want to compress them then its better run them through Optimize Raster to compress. Some datasets such as elevation compress better using compression such as LERC. For more details on how to convert and optimize imagery check out OptimizeRasters.

Using Mosaic Dataset is the best way to manage the data as this references the source data (full resolution and any pyramids) and enables the definition of additional attributes, functions to process the data on the fly and enables quick access to all the data either directly in ArcGIS Pro or serving as image services through ArcGIS Image Server. When directly using the mosaic dataset in ArcGIS Pro it acts as a layer defining functions applied to the appropriate imagery. Yes you are accessing the full resolution source data when zooming in. The compression for transmission that you set is the compression used to transmit the data. It is the client application that can set it and the compression quality at any time. There is no reason to only allow ‘None’. As you mention setting this to say JPEG can speed up transmission of the data.

There are Export and download options. Export will typically resample the data and apply any defined functions on the data. The application defines the resampling compression etc. Download downloads the original pixels were possible. I say were possible as if you use say a MrSID then the data is compressed and the data must first be decompressed before writing it to a new file so the data structure will change but not the pixel values (unless you define another lossy compression).

You can directly use Mosaic datasets as input to your analysis tools. This enables large collections of imagery and rasters to be seen as a single dataset, with control of many properties. If you want to ensure that the pixels are exactly the same as those stored on disk then you should use the environment variables to set the appropriate pixel size, projection and extent to ensure the requested pixels align with the source.

Do check out the ArcGIS Imagery Workflows page, it provides lots of best practices

View solution in original post

PeterBecker · ‎01-07-2023

There is very little point compressing imagery using ZIP. I recommend against using ZIP to compress a directory of images. If you are looking for lossless compression of imagery then simplest is to use Deflate or LZW. There are some better lossless compression methods, but the differences will be limited. If you have collection of imagery that are not compressed and you want to compress them then its better run them through Optimize Raster to compress. Some datasets such as elevation compress better using compression such as LERC. For more details on how to convert and optimize imagery check out OptimizeRasters.

Using Mosaic Dataset is the best way to manage the data as this references the source data (full resolution and any pyramids) and enables the definition of additional attributes, functions to process the data on the fly and enables quick access to all the data either directly in ArcGIS Pro or serving as image services through ArcGIS Image Server. When directly using the mosaic dataset in ArcGIS Pro it acts as a layer defining functions applied to the appropriate imagery. Yes you are accessing the full resolution source data when zooming in. The compression for transmission that you set is the compression used to transmit the data. It is the client application that can set it and the compression quality at any time. There is no reason to only allow ‘None’. As you mention setting this to say JPEG can speed up transmission of the data.

There are Export and download options. Export will typically resample the data and apply any defined functions on the data. The application defines the resampling compression etc. Download downloads the original pixels were possible. I say were possible as if you use say a MrSID then the data is compressed and the data must first be decompressed before writing it to a new file so the data structure will change but not the pixel values (unless you define another lossy compression).

You can directly use Mosaic datasets as input to your analysis tools. This enables large collections of imagery and rasters to be seen as a single dataset, with control of many properties. If you want to ensure that the pixels are exactly the same as those stored on disk then you should use the environment variables to set the appropriate pixel size, projection and extent to ensure the requested pixels align with the source.

Do check out the ArcGIS Imagery Workflows page, it provides lots of best practices

JM32 · ‎01-10-2023

Thank you for the info! Very helpful.

Any particular reason why you recommend against using ZIP to compress directories of images? My guess is that it isn't very good at compressing this type of information compared to other types.

PeterBecker · ‎01-10-2023

ZIP is a container with lossless compression. The compression is generic an not optimized to imagery data. There are some versions of ZIP that can get better compression, but these are not standard so it may break your future readability question. Most imagery will not compress much if using most lossless compression. If the data is already compressed then and additional compression (eg ZIP) will add very little. The exception is some 16bit data or categorical data that has many repeating values. If you are just looking for generic compression to back up the data then this is available as part of the backup process should provide. If space is your concern then suggest optimizing the data using a more suitable compression. You can also exclude the overviews (which will save about 30%).

JM32 · ‎01-10-2023

Makes sense, I'm just starting to dive into the world of compression types. I've always wondered why imagery never seems to go down much in size from ZIP, even though it takes some time depending on how big the data is.

Thank you again for the helpful info!