Hi there,
I am trying to sum close to 6,000 ESRI grids and found a piece of script on this forum that I have tried to adapt (attached). The script runs fine and prompts every grid but eventually crashes when it reaches the last raster. No output is created.
Would you have any idea as to why it crashes? The script seems to work fine when I test it on 5-10 grids. Could the reason be the 5,000 ESRI grid limit that can be stored in a single workspace directory (Esri Grid format—Help | ArcGIS for Desktop) ? I did try on less than 5,000 grids but had the same outcome.
I am using ArcGIS 10.2.2 and Python 2.7.5.
Thanks for your help!
Mathieu
the download has a ...py.zip filename, but the zip file contains an addin. do you have the script?
Hi Dan,
The zip file contains the *.py that I edited and ran within IDLE. Here is the script:
import os,sys
import arcpy
from arcpy.sa import *
sPath = sys.path[0]
dataPath = 'E:\BOM\daily-temperature\daily-maximum-temperature\max_temp_1971_2000_grids' # ADD your workspace path here
outPath = 'E:\BOM\daily-temperature\daily-maximum-temperature'
arcpy.env.overwriteOutput = 1
arcpy.CheckOutExtension('Spatial')
arcpy.env.scratchWorkspace = outPath
arcpy.env.workspace = dataPath
#create a list of rasters in the workspace
rasters = arcpy.ListRasters('','')
i = 0
#loop through rasters in list
for raster in rasters:
print "processing raster: %s" %os.path.join(dataPath,raster)
#convert nodata to zero
out1 = Con(IsNull(raster), 0, raster)
#sum rasters together
if i == 0:
out2 = out1
i += 1
else:
out2 = out2 + out1
i += 1
#save final output
out2.save(os.path.join(outPath,'sumRas2'))
Thanks,
Mathieu
Just to make the thread more readable I have included the script you attached:
import os,sys import arcpy from arcpy.sa import * sPath = sys.path[0] dataPath = 'E:\BOM\daily-temperature\daily-maximum-temperature\max_temp_1971_2000_grids' # ADD your workspace path here outPath = 'E:\BOM\daily-temperature\daily-maximum-temperature' arcpy.env.overwriteOutput = 1 arcpy.CheckOutExtension('Spatial') arcpy.env.scratchWorkspace = outPath arcpy.env.workspace = dataPath #create a list of rasters in the workspace rasters = arcpy.ListRasters('','') i = 0 #loop through rasters in list for raster in rasters: print "processing raster: %s" %os.path.join(dataPath,raster) #convert nodata to zero out1 = Con(IsNull(raster), 0, raster) #sum rasters together if i == 0: out2 = out1 i += 1 else: out2 = out2 + out1 i += 1 #save final output out2.save(os.path.join(outPath,'sumRas2'))
The first thing I notice is the long path with multiple minus signs. This can cause problems when working with grid files. The number of files in a single folder can be the cause of problems too. I also notice that for each raster you first set the NoData cells to 0 and then add it to the sum. Have you considered using Cell Statistics—Help | ArcGIS for Desktop with the SUM statistics type and the DATA option. This allows you to sum multiple rasters and ignore the NoData cells (they will be considered 0 in the SUM operation). I don't think you will be able to use 6000 rasters as inputs but you can chunk the list and perform it on smaller batches and afterwards on the intermediate results.
Hi Xander,
I did try without setting the NoData cells to Zero and it didn't change the result. The Cell Statistics tool only allows 1,000 rasters if I am not mistaken so I could give it a go in several batches. The thing is I have to run this script on another 10,000 files and I was hoping to get it to work on the whole workspace.
It looks like I will have to subdivide my data though. I'll try to shorten the path too.
Thanks
When I mentioned using chunk, I was referring to let Python divide the list of rasters into chunks of less than 1000 rasters, perform the sum and store the result temporarily and at the end of the chunks, combine the temporal results into one.
An example of how to "chunk":
def main(): my_list = ["Raster{0}".format(i) for i in range(100)] for my_list_chunk in chunks(my_list, 10): print my_list_chunk def chunks(l, n): """ Yield successive n-sized chunks from l.""" for i in xrange(0, len(l), n): yield l if __name__ == '__main__': main()
This will yield 10 lists with 10 elements each:
['Raster0', 'Raster1', 'Raster2', 'Raster3', 'Raster4', 'Raster5', 'Raster6', 'Raster7', 'Raster8', 'Raster9'] ['Raster10', 'Raster11', 'Raster12', 'Raster13', 'Raster14', 'Raster15', 'Raster16', 'Raster17', 'Raster18', 'Raster19'] ['Raster20', 'Raster21', 'Raster22', 'Raster23', 'Raster24', 'Raster25', 'Raster26', 'Raster27', 'Raster28', 'Raster29'] ['Raster30', 'Raster31', 'Raster32', 'Raster33', 'Raster34', 'Raster35', 'Raster36', 'Raster37', 'Raster38', 'Raster39'] ['Raster40', 'Raster41', 'Raster42', 'Raster43', 'Raster44', 'Raster45', 'Raster46', 'Raster47', 'Raster48', 'Raster49'] ['Raster50', 'Raster51', 'Raster52', 'Raster53', 'Raster54', 'Raster55', 'Raster56', 'Raster57', 'Raster58', 'Raster59'] ['Raster60', 'Raster61', 'Raster62', 'Raster63', 'Raster64', 'Raster65', 'Raster66', 'Raster67', 'Raster68', 'Raster69'] ['Raster70', 'Raster71', 'Raster72', 'Raster73', 'Raster74', 'Raster75', 'Raster76', 'Raster77', 'Raster78', 'Raster79'] ['Raster80', 'Raster81', 'Raster82', 'Raster83', 'Raster84', 'Raster85', 'Raster86', 'Raster87', 'Raster88', 'Raster89'] ['Raster90', 'Raster91', 'Raster92', 'Raster93', 'Raster94', 'Raster95', 'Raster96', 'Raster97', 'Raster98', 'Raster99']
One question, I don't think your goal is to get the total sum of maximum temperatures, right? You are probably looking for an average (dividing the sum by the number of rasters)? The Cell Statistics also comes with the MEAN statistics type and other perhaps interesting statistics types.
Thanks for that, I wouldn't have thought about it!
You're right, my goal is not to get the total sum but it is a first step into a more complex calculation. Once I get this script running smoothly, I will fine tune the calculation.
Cheers
I noticed another thing... the way you specify your paths. This will probably yield problems. There is a nice post by Dan Patterson that explains in more detail about paths in Python: Filenames and file paths in Python .
I usually prefer the raw notation (put an r in front of the path, outside the quotes).
To give you a start (code below is not tested!):
def main(): import os import arcpy data_path = r'E:\BOM\daily-temperature\daily-maximum-temperature\max_temp_1971_2000_grids' # out_path = r'E:\BOM\daily-temperature\daily-maximum-temperature' res_ws = r'E:\BOM\daily-temperature\daily-maximum-temperature\yourGDB.gdb' # I prefer gdb over folder arcpy.env.overwriteOutput = True arcpy.CheckOutExtension('Spatial') arcpy.env.workspace = data_path # create fgdb res_ws path, name = os.path.split(res_ws) arcpy.CreateFileGDB_management(path, name) #create a list of rasters in the workspace rasters = arcpy.ListRasters() i = 0 lst_cellstat = [] for rasters_chunk in chunks(rasters, 500): i += 1 cellstat = arcpy.sa.CellStatistics(rasters_chunk, "SUM", "DATA") outname = os.path.join(res_ws, "cellstat{0}".format(i)) cellstat.save(outname) lst_cellstat.append(outname) # create the final raster cellstat = arcpy.sa.CellStatistics(lst_cellstat, "SUM", "DATA") outname = os.path.join(res_ws, "total_sum") cellstat.save(outname) # eliminate intermediate results for ras in lst_cellstat: arcpy.Delete_management(ras) def chunks(l, n): """ Yield successive n-sized chunks from l.""" for i in xrange(0, len(l), n): yield l if __name__ == '__main__': main()
I'll try it as soon as I can!