BIG Data Processing Query

EmilyHale1

I am currently using ArcPro3.6.0, and have the Advanced License with the Spatial Analyst Extension. I have been tasked with computing various landscape metrics (including Shannon Diversity, Edge Density and Interspersion and Juxtaposition) across a custom land cover raster we built that covers Canada's agricultural landscape. We typically use 5kmX5km grids and use the Split Raster tool to export each grid as a .tif, and then use a batch import file to bring everything into FragStats where the necessary metrics are calculated.

We are now exploring the possibility/viability of moving to a 1kmX1km grid, which is proving difficult. To "experiment", we clipped our area of interest to the province of Ontario, which still leaves us with over 348,000 grids. When trying to run the Split Raster tool, each .tif was taking about 4min, which immediately makes that a not very efficient process. So then we tried running this Split Raster tool to which exported rasters were being put into a fgdb, which did speed up the time to about 30sec per grid. For curiosities sake, I took another subset of my grids (about 2000 units) and tried the same split raster approach, and it went back to processing 200grids in less then a minute, so it seems like the more grids its splitting, the longer each individual grid takes (which I don't really understand, but that just may be the reality of the situation).

I'm looking for advice/suggestions on what we can do to try and build efficiencies in our process. I explored using model builder to iterate through my grids and employ the Extract by Mask tool, which seemed to work well but we're still looking at ~7days (estimate) to just get the grids out for Ontario, and this is to be used across the country. I've looked into some downloadable toolboxes and explored ATtILA Landscape metrics toolbox, but it throws an error when trying to create their Land Cover Classification Editor, which pretty much puts me dead in the water with the remaining tools.

Additional context that may be of value: I am working off of two local 2TB-SSDs, one of which is setup as my scratch workplace. My computer has 20 cores, which I've maxed out in my Environments setting. The raster I'm working with has 56 land cover classes (not sure if that matters) and has a resolution of 10m. I'm hoping there is some alternative way to EITHER get my raster split by the 1kmX1km grids OR find a way to extract the metrics I need directly in ArcPro. Any suggestions would be appreciated!! I would be willing to explore R or Python approaches, but I'm not necessarily convinced that this would improve processing times...

DanPatterson

are there large areas that could be excluded? that is uniform-ish over the 5km x 5km grid size? There would be little point in resampling to a smaller grid spec in that case. It may not solve all of your issues, but some pre-processing may reduce the overall time and resultant grids to examine at the finer scale

... sort of retired...

EmilyHale1

Unfortunately we've already reduced our area of interest as much as possible by restricting everything to areas within 1km of agricultural land.

DuncanHornby

A tweak that would improve performance is the raster you are splitting up you say has 56 classes. Make sure you have the optimal bit-depth for that raster you would want it to be a Unsigned 8 bit.

Not sure if its relevant to your situation but most raster processing tools don't honour compression if you are writing output to TIF. I find one has to run the tif through the copy raster tool to ensure compression is turned on.

EmilyHale1

Thanks for the input and suggestions from everyone!
We ended up finding a really slick work-around that seemed to do the trick - We added a new column to our shapefile's attribute table, and grouped records in bunches of 1000 records (ie. first 1000 records in the shapefile are assigned 1, records 1001-2000 are assigned 2, etc). We were then able to create a model that utilized the Split Raster tool but iterated (Feature Selection) by the GroupID that we had assigned. It seems like there is a sweet spot with the Split Raster tool where once you crest over a certain amount of records, the processing speed per record slows down. We found that a count of 1000 features allowed us to run through groups at a rate of about 3min per, which was by far the fastest processing time we've achieved (a little less than 2hrs to split all our tiles (55,850 tiles, 1kmX1km, Raster @ 10m resolution).