I have hundreds of SMAP soil moisture data and need to calculate the daily average of them. For every day there are 8 files (every 3 hours) and I need a code to calculate the average based on their names.
I have a code but i am interested to know how can I define a code that average data based on their names?
# Import arcpy module import arcpy # Local variables: SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15_tif = "SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15.tif" SMAP_L4_SM_gph_20150331T073000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0d_tif = "SMAP_L4_SM_gph_20150331T073000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0d.tif" SMAP_L4_SM_gph_20150331T103000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c14_tif = "SMAP_L4_SM_gph_20150331T103000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c14.tif" SMAP_L4_SM_gph_20150331T133000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf5_tif = "SMAP_L4_SM_gph_20150331T133000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf5.tif" SMAP_L4_SM_gph_20150331T163000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf3_tif = "SMAP_L4_SM_gph_20150331T163000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf3.tif" SMAP_L4_SM_gph_20150331T193000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf4_tif = "SMAP_L4_SM_gph_20150331T193000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf4.tif" SMAP_L4_SM_gph_20150331T223000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0f_tif = "SMAP_L4_SM_gph_20150331T223000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0f.tif" SMAP_L4_SM_gph_20150331T013000_Vv2030_001_Geophysical_Data_sm_surface_bdea9eb8_tif = "SMAP_L4_SM_gph_20150331T013000_Vv2030_001_Geophysical_Data_sm_surface_bdea9eb8.tif" v_name_ = "D:\\NASA\\Data\\SMAP\\2014\\Processed\\%name%.tif" # Process: Raster Calculator arcpy.gp.RasterCalculator_sa("( \"%SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15.tif%\" + \"%SMAP_L4_SM_gph_20150331T073000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0d.tif%\" + \"%SMAP_L4_SM_gph_20150331T103000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c14.tif%\" +\"%SMAP_L4_SM_gph_20150331T133000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf5.tif%\" + \"%SMAP_L4_SM_gph_20150331T163000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf3.tif%\" + \"%SMAP_L4_SM_gph_20150331T193000_Vv2030_001_Geophysical_Data_sm_surface_bdea9bf4.tif%\" + \"%SMAP_L4_SM_gph_20150331T223000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c0f.tif%\" + \"%SMAP_L4_SM_gph_20150331T013000_Vv2030_001_Geophysical_Data_sm_surface_bdea9eb8.tif%\") / 8", v_name_) Your help is really appreciated.
Solved! Go to Solution.
Assuming the characters before the date in the filename are always the same length and all of your rasters are together in the same folder, see if this prints out the raster file names for each day that exists in the filename dates.
import arcpy
from collections import defaultdict
def main():
raster_dir = r"D:\NASA\Data\SMAP\2014\Processed"
arcpy.env.workspace = raster_dir
raster_days = defaultdict(list)
for r in arcpy.ListRasters():
raster_days[r[15:23]].append(r)
for date, rasters in raster_days.items():
print(date)
for r in rasters:
print("\t{}".format(r))
if __name__ == '__main__':
main()
This creates a dictionary where the key is a date string (YYYYMMDD) and the value is a list of raster filenames that all have that date. If that works, you should be able to use this dictionary to calculate your statistics however you want.
EDIT:
Fixed typo on raster_days variable name in for loop.
Would it be easier to use the MEAN raster property for each raster?
Actually I need a raster file for every day (instead of 8 raster files for every day) which every pixel contains the average of soil moisture of the same pixel in 8 different time spells. By now, I can not define how arcpy identify files based on their name which contains the year month and year.
Can you use raster properties to average the average of every raster for the day? In other words, loop over each raster for the day to grab its average and put it in a list, then average the list; like numpy.mean()
Or are you also asking how to look at all these raster files and identify which ones are from the same day with the file name?
Exactly my question is how to look at all these raster files and identify which ones are from the same day with the file name? and write a code that can define data by days and average the according to every day.
Is the prefix and suffix of the filenames (the stuff before and after the date) always the same? If not, is the length the same? What are the rules on how those are generated?
These are samples of the data.
SMAP_L4_SM_gph_20150331T043000_Vv2030_001_Geophysical_Data_sm_surface_bdea9c15.tif SMAP_L4_SM_gph_20161231T223000_Vv2030_001_Geophysical_Data_sm_surface_352ca87d.tif SMAP_L4_SM_gph_20170104T043000_Vv2030_001_Geophysical_Data_sm_surface_616d799c.tif
In addition to the date (year, month, day) which are different, the letters in the last part of the name (before .tif) are different.
Assuming the characters before the date in the filename are always the same length and all of your rasters are together in the same folder, see if this prints out the raster file names for each day that exists in the filename dates.
import arcpy
from collections import defaultdict
def main():
raster_dir = r"D:\NASA\Data\SMAP\2014\Processed"
arcpy.env.workspace = raster_dir
raster_days = defaultdict(list)
for r in arcpy.ListRasters():
raster_days[r[15:23]].append(r)
for date, rasters in raster_days.items():
print(date)
for r in rasters:
print("\t{}".format(r))
if __name__ == '__main__':
main()
This creates a dictionary where the key is a date string (YYYYMMDD) and the value is a list of raster filenames that all have that date. If that works, you should be able to use this dictionary to calculate your statistics however you want.
EDIT:
Fixed typo on raster_days variable name in for loop.
defaultDict is pretty fancy for a beginner, but what a useful approach! Thanks for sharing that.
import arcpy
from collections import defaultdict
def main():
raster_dir = r"D:\NASA\Data\SMAP\OutPut"
arcpy.env.workspace = raster_dir
raster_days = defaultdict(list)
for r in arcpy.ListRasters():
raster_days[r[15:23]].append(r)
for date, rasters in raster_days.items():
print(date)
for r in rasters:
print("\t{}".format(r))
if __name__ == '__main__':
main()
This Code exactly classified my data into day groups. Thank you so much for your help. I have just changes "day_rasters.items()" in your code into "
raster_days.items()"
.
Thank you so much.