Sum 6,000+ rasters - Python script

10896
28
02-02-2016 02:05 AM
MathieuLacorde1
New Contributor

Hi there,

I am trying to sum close to 6,000 ESRI grids and found a piece of script on this forum that I have tried to adapt (attached). The script runs fine and prompts every grid but eventually crashes when it reaches the last raster. No output is created.

Would you have any idea as to why it crashes? The script seems to work fine when I test it on 5-10 grids. Could the reason be the 5,000 ESRI grid limit  that can be stored in a single workspace directory (Esri Grid format—Help | ArcGIS for Desktop) ? I did try on less than 5,000 grids but had the same outcome.

I am using ArcGIS 10.2.2 and Python 2.7.5.

Thanks for your help!

Mathieu

0 Kudos
28 Replies
DanPatterson_Retired
MVP Emeritus

the download has a ...py.zip filename, but the zip file contains an addin. do you have the script?

0 Kudos
MathieuLacorde1
New Contributor

Hi Dan,

The zip file contains the *.py that I edited and ran within IDLE. Here is the script:

import os,sys

import arcpy

from arcpy.sa import *

sPath = sys.path[0]

dataPath = 'E:\BOM\daily-temperature\daily-maximum-temperature\max_temp_1971_2000_grids' # ADD your workspace path here

outPath = 'E:\BOM\daily-temperature\daily-maximum-temperature'

arcpy.env.overwriteOutput = 1

arcpy.CheckOutExtension('Spatial')

arcpy.env.scratchWorkspace = outPath

arcpy.env.workspace = dataPath

#create a list of rasters in the workspace

rasters = arcpy.ListRasters('','')

i = 0

#loop through rasters in list

for raster in rasters:

    print "processing raster: %s" %os.path.join(dataPath,raster)

    #convert nodata to zero

    out1 = Con(IsNull(raster), 0, raster)

    #sum rasters together

    if i == 0:

        out2 = out1

        i += 1

    else:

        out2 = out2 + out1

        i += 1

#save final output

out2.save(os.path.join(outPath,'sumRas2'))

Thanks,

Mathieu

0 Kudos
XanderBakker
Esri Esteemed Contributor

Just to make the thread more readable I have included the script you attached:

import os,sys
import arcpy
from arcpy.sa import *

sPath = sys.path[0]
dataPath = 'E:\BOM\daily-temperature\daily-maximum-temperature\max_temp_1971_2000_grids' # ADD your workspace path here
outPath = 'E:\BOM\daily-temperature\daily-maximum-temperature'

arcpy.env.overwriteOutput = 1
arcpy.CheckOutExtension('Spatial')
arcpy.env.scratchWorkspace = outPath
arcpy.env.workspace = dataPath
#create a list of rasters in the workspace
rasters = arcpy.ListRasters('','')

i = 0
#loop through rasters in list
for raster in rasters:
    print "processing raster: %s" %os.path.join(dataPath,raster)

    #convert nodata to zero
    out1 = Con(IsNull(raster), 0, raster)

    #sum rasters together
    if i == 0:
        out2 = out1
        i += 1
    else:
        out2 = out2 + out1
        i += 1

#save final output
out2.save(os.path.join(outPath,'sumRas2'))

The first thing I notice is the long path with multiple minus signs. This can cause problems when working with grid files. The number of files in a single folder can be the cause of problems too. I also notice that for each raster you first set the NoData cells to 0 and then add it to the sum. Have you considered using Cell Statistics—Help | ArcGIS for Desktop​ with the SUM statistics type and the DATA option. This allows you to sum multiple rasters and ignore the NoData cells (they will be considered 0 in the SUM operation). I don't think you will be able to use 6000 rasters as inputs but you can chunk the list and perform it on smaller batches and afterwards on the intermediate results.

MathieuLacorde1
New Contributor

Hi Xander,

I did try without setting the NoData cells to Zero and it didn't change the result. The Cell Statistics tool only allows 1,000 rasters if I am not mistaken so I could give it a go in several batches. The thing is I have to run this script on another 10,000 files and I was hoping to get it to work on the whole workspace.

It looks like I will have to subdivide my data though. I'll try to shorten the path too.

Thanks 

0 Kudos
XanderBakker
Esri Esteemed Contributor

When I mentioned using chunk, I was referring to let Python divide the list of rasters into chunks of less than 1000 rasters, perform the sum and store the result temporarily and at the end of the chunks, combine the temporal results into one.

An example of how to "chunk":

def main():
    my_list = ["Raster{0}".format(i) for i in range(100)]

    for my_list_chunk in chunks(my_list, 10):
        print my_list_chunk

def chunks(l, n):
    """ Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n):
        yield l

if __name__ == '__main__':
    main()

This will yield 10 lists with 10 elements each:

['Raster0', 'Raster1', 'Raster2', 'Raster3', 'Raster4', 'Raster5', 'Raster6', 'Raster7', 'Raster8', 'Raster9']
['Raster10', 'Raster11', 'Raster12', 'Raster13', 'Raster14', 'Raster15', 'Raster16', 'Raster17', 'Raster18', 'Raster19']
['Raster20', 'Raster21', 'Raster22', 'Raster23', 'Raster24', 'Raster25', 'Raster26', 'Raster27', 'Raster28', 'Raster29']
['Raster30', 'Raster31', 'Raster32', 'Raster33', 'Raster34', 'Raster35', 'Raster36', 'Raster37', 'Raster38', 'Raster39']
['Raster40', 'Raster41', 'Raster42', 'Raster43', 'Raster44', 'Raster45', 'Raster46', 'Raster47', 'Raster48', 'Raster49']
['Raster50', 'Raster51', 'Raster52', 'Raster53', 'Raster54', 'Raster55', 'Raster56', 'Raster57', 'Raster58', 'Raster59']
['Raster60', 'Raster61', 'Raster62', 'Raster63', 'Raster64', 'Raster65', 'Raster66', 'Raster67', 'Raster68', 'Raster69']
['Raster70', 'Raster71', 'Raster72', 'Raster73', 'Raster74', 'Raster75', 'Raster76', 'Raster77', 'Raster78', 'Raster79']
['Raster80', 'Raster81', 'Raster82', 'Raster83', 'Raster84', 'Raster85', 'Raster86', 'Raster87', 'Raster88', 'Raster89']
['Raster90', 'Raster91', 'Raster92', 'Raster93', 'Raster94', 'Raster95', 'Raster96', 'Raster97', 'Raster98', 'Raster99']

One question, I don't think your goal is to get the total sum of maximum temperatures, right? You are probably looking for an average (dividing the sum by the number of rasters)? The Cell Statistics also comes with the MEAN statistics type and other perhaps interesting statistics types.

MathieuLacorde1
New Contributor

Thanks for that, I wouldn't have thought about it!

You're right, my goal is not to get the total sum but it is a first step into a more complex calculation. Once I get this script running smoothly, I will fine tune the calculation.

Cheers

0 Kudos
XanderBakker
Esri Esteemed Contributor

I noticed another thing... the way you specify your paths. This will probably yield problems. There is a nice post by Dan Patterson​ that explains in more detail about paths in Python: Filenames and file paths in Python .

I usually prefer the raw notation (put an r in front of the path, outside the quotes).

XanderBakker
Esri Esteemed Contributor

To give you a start (code below is not tested!):

def main():
    import os
    import arcpy

    data_path = r'E:\BOM\daily-temperature\daily-maximum-temperature\max_temp_1971_2000_grids'
    # out_path = r'E:\BOM\daily-temperature\daily-maximum-temperature'
    res_ws = r'E:\BOM\daily-temperature\daily-maximum-temperature\yourGDB.gdb' # I prefer gdb over folder

    arcpy.env.overwriteOutput = True
    arcpy.CheckOutExtension('Spatial')
    arcpy.env.workspace = data_path

    # create fgdb res_ws
    path, name = os.path.split(res_ws)
    arcpy.CreateFileGDB_management(path, name)

    #create a list of rasters in the workspace
    rasters = arcpy.ListRasters()

    i = 0
    lst_cellstat = []
    for rasters_chunk in chunks(rasters, 500):
        i += 1
        cellstat = arcpy.sa.CellStatistics(rasters_chunk, "SUM", "DATA")
        outname = os.path.join(res_ws, "cellstat{0}".format(i))
        cellstat.save(outname)
        lst_cellstat.append(outname)

    # create the final raster
    cellstat = arcpy.sa.CellStatistics(lst_cellstat, "SUM", "DATA")
    outname = os.path.join(res_ws, "total_sum")
    cellstat.save(outname)

    # eliminate intermediate results
    for ras in lst_cellstat:
        arcpy.Delete_management(ras)

def chunks(l, n):
    """ Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n):
        yield l

if __name__ == '__main__':
    main()
MathieuLacorde1
New Contributor

I'll try it as soon as I can!

0 Kudos