Calculate daily average of raster files based on their names in python

AmenehTavakol · ‎06-13-2017

I have hundreds of SMAP soil moisture data and need to calculate the daily average of them. For every day there are 8 files (every 3 hours) and I need a code to calculate the average based on their names.

I have a code but i am interested to know how can I define a code that average data based on their names?

# Import arcpy module
import arcpy


# Local variables:
SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15_tif = "SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15.tif"
SMAP_L4_SM_gph_20150331T073000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0d_tif = "SMAP_L4_SM_gph_20150331T073000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0d.tif"
SMAP_L4_SM_gph_20150331T103000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c14_tif = "SMAP_L4_SM_gph_20150331T103000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c14.tif"
SMAP_L4_SM_gph_20150331T133000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf5_tif = "SMAP_L4_SM_gph_20150331T133000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf5.tif"
SMAP_L4_SM_gph_20150331T163000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf3_tif = "SMAP_L4_SM_gph_20150331T163000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf3.tif"
SMAP_L4_SM_gph_20150331T193000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf4_tif = "SMAP_L4_SM_gph_20150331T193000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf4.tif"
SMAP_L4_SM_gph_20150331T223000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0f_tif = "SMAP_L4_SM_gph_20150331T223000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0f.tif"
SMAP_L4_SM_gph_20150331T013000_Vv2030_001_Geophysical_Data_sm_surface_bdea9eb8_tif = "SMAP_L4_SM_gph_20150331T013000_Vv2030_001_Geophysical_Data_sm_surface_bdea9eb8.tif"
v_name_ = "D:\\NASA\\Data\\SMAP\\2014\\Processed\\%name%.tif"

# Process: Raster Calculator
arcpy.gp.RasterCalculator_sa("( \"%SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15.tif%\" + \"%SMAP_L4_SM_gph_20150331T073000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0d.tif%\" + \"%SMAP_L4_SM_gph_20150331T103000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c14.tif%\" +\"%SMAP_L4_SM_gph_20150331T133000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf5.tif%\" + \"%SMAP_L4_SM_gph_20150331T163000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf3.tif%\" + \"%SMAP_L4_SM_gph_20150331T193000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf4.tif%\" + \"%SMAP_L4_SM_gph_20150331T223000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0f.tif%\" + \"%SMAP_L4_SM_gph_20150331T013000_Vv2030_001_Geophysical_Data_sm_surface_bdea9eb8.tif%\") / 8", v_name_)


Your help is really appreciated.

BlakeTerhune · ‎06-13-2017

Assuming the characters before the date in the filename are always the same length and all of your rasters are together in the same folder, see if this prints out the raster file names for each day that exists in the filename dates.

import arcpy
from collections import defaultdict

def main():
    raster_dir = r"D:\NASA\Data\SMAP\2014\Processed"
    arcpy.env.workspace = raster_dir
    raster_days = defaultdict(list)
    for r in arcpy.ListRasters():
        raster_days[r[15:23]].append(r)

    for date, rasters in raster_days.items():
        print(date)
        for r in rasters:
            print("\t{}".format(r))

if __name__ == '__main__':
    main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

This creates a dictionary where the key is a date string (YYYYMMDD) and the value is a list of raster filenames that all have that date. If that works, you should be able to use this dictionary to calculate your statistics however you want.

EDIT:

Fixed typo on raster_days variable name in for loop.

View solution in original post

BlakeTerhune · ‎06-13-2017

Would it be easier to use the MEAN raster property for each raster?

AmenehTavakol · ‎06-13-2017

Actually I need a raster file for every day (instead of 8 raster files for every day) which every pixel contains the average of soil moisture of the same pixel in 8 different time spells. By now, I can not define how arcpy identify files based on their name which contains the year month and year.

BlakeTerhune · ‎06-13-2017

Can you use raster properties to average the average of every raster for the day? In other words, loop over each raster for the day to grab its average and put it in a list, then average the list; like numpy.mean()

Or are you also asking how to look at all these raster files and identify which ones are from the same day with the file name?

AmenehTavakol · ‎06-13-2017

Exactly my question is how to look at all these raster files and identify which ones are from the same day with the file name? and write a code that can define data by days and average the according to every day.

BlakeTerhune · ‎06-13-2017

Is the prefix and suffix of the filenames (the stuff before and after the date) always the same? If not, is the length the same? What are the rules on how those are generated?

AmenehTavakol · ‎06-13-2017

These are samples of the data.

SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15.tif 
SMAP_L4_SM_gph_20161231T223000_Vv2030_001_Geophysical_Data_sm_surface_352ca87d.tif
SMAP_L4_SM_gph_20170104T043000_Vv2030_001_Geophysical_Data_sm_surface_616d799c.tif

In addition to the date (year, month, day) which are different, the letters in the last part of the name (before .tif) are different.

BlakeTerhune · ‎06-13-2017

Assuming the characters before the date in the filename are always the same length and all of your rasters are together in the same folder, see if this prints out the raster file names for each day that exists in the filename dates.

import arcpy
from collections import defaultdict

def main():
    raster_dir = r"D:\NASA\Data\SMAP\2014\Processed"
    arcpy.env.workspace = raster_dir
    raster_days = defaultdict(list)
    for r in arcpy.ListRasters():
        raster_days[r[15:23]].append(r)

    for date, rasters in raster_days.items():
        print(date)
        for r in rasters:
            print("\t{}".format(r))

if __name__ == '__main__':
    main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

This creates a dictionary where the key is a date string (YYYYMMDD) and the value is a list of raster filenames that all have that date. If that works, you should be able to use this dictionary to calculate your statistics however you want.

EDIT:

Fixed typo on raster_days variable name in for loop.

curtvprice · ‎06-14-2017

defaultDict is pretty fancy for a beginner, but what a useful approach! Thanks for sharing that.

AmenehTavakol · ‎06-19-2017

import arcpy
from collections import defaultdict

def main():
    raster_dir = r"D:\NASA\Data\SMAP\OutPut"
    arcpy.env.workspace = raster_dir
    raster_days = defaultdict(list)
    for r in arcpy.ListRasters():
        raster_days[r[15:23]].append(r)

    for date, rasters in raster_days.items():
        print(date)
        for r in rasters:
            print("\t{}".format(r))

if __name__ == '__main__':
    main()

This Code exactly classified my data into day groups. Thank you so much for your help. I have just changes "day_rasters.items()" in your code into "raster_days.items()"

.

Thank you so much.