How to aggregate rasters using raster file name?

AmyShowen · ‎03-12-2013

I have forty years of data in the form of one raster per day. I am hoping to aggregate these rasters so that I get one raster per month. To avoid aggregating thousands of files by hand, I would like to aggregate the files using their files names, which are in the form "TOMS_YEARmMONTHDAY_v8.HDF" (i.e. "TOMS_1976m0105_v8.HDF" for January 5, 1976). Would you please let me know if you have any guidance as to how exactly I might do this? Thanks so much.

curtvprice · ‎03-12-2013

I have forty years of data in the form of one raster per day. I am hoping to aggregate these rasters so that I get one raster per month.

What do you mean by "aggregate"?

AmyShowen · ‎03-13-2013

Each of these raster files contains the same information (i.e., UV exposure). There is a unique raster file for each day. By "aggregate," I am trying to say that I ultimately want to get a raster that represents descriptive statistics (mean, median, etc.) for an entire month of daily raster files. Does this make more sense? Thanks again for your help!

ChrisSnyder · ‎03-13-2013

1) Learn Python - since you are dealing with so many datasets you will want to automate this.
2) You can "aggregate" a months worth of data (for example, get the mean UV level for all the rasters from May 1978), using the 'Cell Statistics' Spatial Analyst tool: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//009z0000007q000000.htm
3. You can use the 'Combine' Spatial Analyst tool (http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//009z0000007r000000.htm) to, in effect, overlay all your month-based aggregates into a single raster, where the output raster will have fields coresponding to you mothly aggregates... such as 'JAN_1978','FEB_1978', etc. You could then use the Combine raster output for analysis... For example, export the attribute table to a PivotTable in Excel.

curtvprice · ‎03-13-2013

I would like to aggregate the files using their files names, which are in the form "TOMS_YEARmMONTHDAY_v8.HDF" (i.e. "TOMS_1976m0105_v8.HDF" for January 5, 1976).

If you aren't quite ready to dive both feet into Python, another option is using ModelBuilder to search for a month of rasters using an iterator, for example, to get one month of data, search for "TOMS_1976m01*". You can use model variables in the wild card, for example: "TOMS_%year%m%month%*", and nest models one in another for looping.

The outside model would iterate by year, inside that would iterate by month and generate the list of daily raster, and inside that would be your CellStatistics function.

This is a lot of work... python or ModelBuilder.

You may want to check out a really cool site that has massive climate data set up on a server that can do the work for you -- if they have the climate data you need.

USGS GeoData Portal
http://cida.usgs.gov/climate/gdp/

The USGS Geo Data Portal (GDP) project provides scientists and environmental resource managers access to downscaled climate projections and other data resources that are otherwise difficult to access and manipulate. This user interface demonstrates an example implementation of the GDP project web-service software and standards-based data integration strategy.

A user of the GDP interface can supply their area of interest as a pre-existing GIS shapefile with one to many unique polygons or by drawing a single polygon using an interactive web-map. A user can select from available GDP project web-service processing algorithms, which include raw data subsetting and area-weighted statistics summarization.

Processing algorithm options such as dataset component of interest, time period of interest and output file formatting must be specified. As the GDP project progresses, other processing algorithms and output formatting options will become available. Datasets available from the initial public release of the portal include historic weather and downscaled climate projections.

DavidMedeiros · ‎03-14-2013

Thanks for the reply Curtis. I'm the GIS research specialist helping Amy with her model and suggested she look to the forum for more help. Unfortunately the constraints of academic based GIS research are such that "learning python" isn't really an option for most researchers, especially those with no prior programming background.

I've used iterators in model builder quite a bit but am not familiar with model variables (other than %name%). In your example ""TOMS_%year%m%month%*", where do the year and month model variables come from? Can we create a list for the years and add that as a variable to the model or do we need to run a search cursor to get all of the names first?

I found this python tool for raster lists (http://resources.arcgis.com/en/help/main/10.1/index.html#/ListRasters/018v0000003w000000/). Is this what we need to nest, one list for yearly raster then another for each month in that year before running the CellStats tool?

Appreciate the help, thanks.

David

curtvprice · ‎03-15-2013

Thanks for the reply Curtis. I'm the GIS research specialist helping Amy with her model and suggested she look to the forum for more help. Unfortunately the constraints of academic based GIS research are such that "learning python" isn't really an option for most researchers, especially those with no prior programming background.

David, Amy,

Writing little Calculate Value scripts are a great way to get started with programming, and honestly to make full use of ModelBuilder you have to. What's awesome about Python is it's a full-featured programming language that is very simple to learn, much easier than even VBScript. This can open doors for scientists without programming background to gain a lot of capability for not a lot of pain. (I was very good with AML but never quite got the hang of .Net and ArcObjects - The more coarse-grained Python GP is much more up my alley. I guess that means i'm not a developer either.) The Esri product teams would be the first to admit that they haven't always made the right call, but they hit a home run IMHO by choosing to build the geoprocessing tool environment around Python.

I've used iterators in model builder quite a bit but am not familiar with model variables (other than %name%). In your example ""TOMS_%year%m%month%*", where do the year and month model variables come from? Can we create a list for the years and add that as a variable to the model or do we need to run a search cursor to get all of the names first?

What I was thinking was nesting iterators as described in the help here:
ArcGIS 10.1 Help: Integrating a model within a model
See the section Advanced use of model iterators

curtvprice · ‎03-15-2013

Here's a working example - attached - just to show that it can be done.

Just to demonstrate how not-scary Python is: you could paste this script at the Python prompt, or recast it as a function and drop it in a Calculate Value tool in ModelBuilder:

import arcpy
arcpy.CheckOutExtension("spatial")
arcpy.env.workspace = pathWhereMyRastersAre
outWks = pathToWriteOutput
for year in range(1976,1979):  # 1976 through 1978
    for month in range(13):      # 1 to 12
        # wildcard - "TOMS_1976m01*.HDF" (format month 1 -> "01")
        wild = "TOMS_{0}m{1:02d}*.HDF".format(year,month) 
        rasList = arcpy.ListRasters(wild) # list all rasters for a month
        outRas = arcpy.sa.CellStatistics(rasList,"MEAN") # monthly mean
        outPath = outWks + "/" + "m{0}m{1:02d}".format(year,month) 
        outRas.save(outPath)