Hello,
I am trying to mosaic the 500+ raster tiles of the Hansen/UMD 2015 GFC forest loss dataset (30m x 30m cell size) into one global dataset and am trying to figure out the most efficient and least time-consuming process to do this. The end result does not necessarily need to be one raster (i.e., using Mosaic to New Raster tool) and I am open to utilizing Mosaic datasets, as long as I can utilize the final product/composite file in Raster Calculator. I do not have much experience using Mosaic datasets prior to this task.
So far I have tried the following methods, but each one seems to be very time consuming (several hours to days) and have resulted in numerous crashes:
- Input all the raster tiles into a Mosaic dataset -- this method seems to be taking several hours just to input all the rasters into the mosaic dataset. The first time I ran this, it took about 10 minutes to input all the rasters, but they were not rendering properly. The second time I attempted this, the script is indicating that there are some 11,000 files being generated for the mosaic dataset and it will take several hours to accomplish.
- Isolate only the tiles that actually have deforestation values (eliminate tiles over the ocean to reduce # of tiles that need to be processed overall) and then mosaic them -- for this, I thought the best way would be to build raster attribute tables for each, and use the Unique Values symbology to delete rasters with only Value = 0 (No Data), then mosaic the remaining files. I am building the attribute tables for each raster though a simple model in Model Builder, but still, the process is very time-intensive. In addition, I'm not sure how to automate changing the symbology from Stretched (default when you add data to the TOC) to Unique Values and have been doing it manually. In this case, I don't think that batching an Apply Symbology approach would work, because I want to isolate those rasters that have only No Data values to delete them. (Batching the Apply Symbology would make each raster seem like it has values of 0 and 1 (or only 0 values, depending on which raster is used), which would result in more work for me to fix it.)
I have also tried just a straight Mosaic to New Raster with all 500 rasters, but it did not complete even after 3 days of processing. I was also considering deleting according to file size, however, there are instances in which the smaller sized files have values of 1/contain deforestation data, so I can't use that kind of blanket approach.
Does anyone have any suggestions on how to handle large numbers of rasters in one go? Or, does anyone know of a script to eliminate the GFC tiles that are over ocean only / eliminate the tiles that contain no data?
Solved! Go to Solution.
Hello all,
I realized that I had reprojected all the tiles prior to mosaicking them into the global dataset (since the VCF dataset is in 500m x 500m and Cylindrical_Equal_Area). This must have sped up the process a lot and been much less intensive. Thanks all for your efforts and replies.
...as long as I can utilize the final product/composite file in Raster Calculator.
I think you will find that no matter what you do you will not be able to analyse the final raster in the ArcGIS Raster Calculator. A global 30m dataset is just too big to analyse on a desktop PC. The Australian Geoscience Data Cube (time series Landsat over Australia) is analysed on the Raijin supercomputer and it is stored as 1x1 degree tiles not as individual continental scale rasters.
Perhaps you could provide some more information about what you want to achieve?
Hello Luke,
Thank you for your response. We are trying to use the Hansen/GFC loss data to provide an estimate on global deforestation using an altered version of the VCF forest cover dataset. After mosaicking the loss tiles, we used the following calc in Raster Calculator to add the deforestation to the forest cover layer (I'm also including the range of values for each raster): (Deforestation [Values: 0 (NoData),1(deforestation)] x 100) + Forest cover [Values: 31->80 (% forest cover)]. Any values >=100 were reclassified as loss. Finally, the overall product was resampled to 1km x 1km and reprojected to continue doing some other analyses (this is just the first part of a bigger problem we are trying to solve).
I had successfully completed all of this before and verified that the files had processed correctly, but I recently lost my datasets due to an external hard drive crash. In between the first attempt at this analysis and the HD crash, our institution migrated to Office 365, which to my understanding involves numerous processes running in the background, and my computer has performed significantly slower ever since.
Believe it or not, after removing the extraneous ocean tiles from the GFC dataset, I was able to mosaic the loss tiles within 1-2 hours; all the mosaicked data lined up with the data in the original tiles. While my computer is nothing terribly special--a fairly old Dell Latitude E6410 laptop with an i7 processor and only 4GB of RAM--it actually regularly outperformed several of our other desktop computers. I never imagined this would be such an issue to rerun!
Here are instructions for using a mosaic dataset or an image service for analysis.
Hello Chad,
Thank you for your reply. I have tried using a mosaic dataset and was actually able to make it successfully, but it has caused ArcGIS to freeze and crash anytime I have used it as an input dataset in Raster Calculator. I even let the calc run in the foreground overnight and saw that there was 0% progress
I'm not sure if this is because of my computer being somewhat old and slowed down due to the O365 migration or some other reason.
Hello all,
I realized that I had reprojected all the tiles prior to mosaicking them into the global dataset (since the VCF dataset is in 500m x 500m and Cylindrical_Equal_Area). This must have sped up the process a lot and been much less intensive. Thanks all for your efforts and replies.