Select to view content in your preferred language

best way to get size on disk of raster

5546
7
08-12-2011 02:30 PM
KimFisher
Emerging Contributor
Hello all,

I have a script that requires an input raster. I would like to get the size on disk of this raster, whether it be stored in files or gdb of whatever flavor, and in whatever raster format. I cannot find a built-in gp/arcpy function for this -- neither Describe() nor GetRasterProperties() seem to provide it.

I know about os.path.getsize() and could write a function to find the size of a directory, but not inside a personal/enterprise gdb, and anyway writing a function to anticipate every possible raster format and storage location format seems like a daunting task.

Has anybody done this already, or have suggestions about a good approach?

Thanks,
Kim Fisher
0 Kudos
7 Replies
MarkCarroll
Emerging Contributor
You can calculate the size of the data by calculating the area (in pixels) times the number of bytes times the number of layers.  For example:

If your image is 4800 pixels wide x 4800 pixels high and is a single 8-bit plane the size is calculated as

4800*4800*1*1

if it is 16-bit or 32-bit it would be 4800*4800*2*1; or 4800*4800*4*1.  That does not account for the length of the header, and that will vary depending on the format but should be fairly small.  Compression will, obviously, also reduce the overall size of the file and I don't know how you would calculate that accurately.

Hope this helps.
0 Kudos
KimFisher
Emerging Contributor
Thanks; it does indeed help. I'd forgotten that approach. I've used it with modest success; for immediate purposes, it gets me something with enough precision to use. Appreciate the help.

But it doesn't quite equal what I can measure via the OS, it doesn't as you observe take compression into account, etc. Seems to me this would be a basic Data Management tool ESRI should offer out of the box, especially since they seem to do it internally -- looking at the properties of any given raster gives you uncompressed size on disk.
0 Kudos
LukeSturtevant
Frequent Contributor

I know this is an old post, but if anyone is looking to do this with ArcMap 10x you can cast your raster as a Raster Object​ and use the uncompressedSize property to get the size on disk.

0 Kudos
curtvprice
MVP Esteemed Contributor

That would the maximum possible size -- as you note, almost always smaller due to compression.

Python has functions in the os module to get file sizes off the operating system, so one could write a Python function to measure the size of the raster files using os.path.getsize(). This could be quite involved because you'd have to look at all the files. But it could be done for file-based rasters (not for rasters in the gdb, which are not visible from the OS).

0 Kudos
LukeSturtevant
Frequent Contributor

I use the uncompressed size method in the tool validation just as a quick way to automatically select the largest raster in a TOC which is usually the watershed wide Lidar.

To get the actual size of a raster with it's associated files such as .ovr and .aux.xml using the os library I've used this code:

import os

raster = # Path to raster file
basename = os.path.basename(raster).split(".")[0]
rootFolder = os.path.dirname(raster)
associatedFiles = [os.path.join(rootFolder,f) for f in next(os.walk(os.path.dirname(raster)))[2] if f.split(".")[0] == basename]

if len(os.path.basename(raster).split(".")) == 1:
    fileList = next(os.walk(raster))[2]
    dirSize = sum([os.path.getsize(os.path.join(raster,f)) for f in fileList])
    rasSize = sum([os.path.getsize(f) for f in associatedFiles]) + dirSize
else:
    rasSize = sum([os.path.getsize(f) for f in associatedFiles])

def convertSize(size,precision=2):
    suffixes=['B','KB','MB','GB','TB']
    suffixIndex = 0
    while size > 1024 and suffixIndex < 4:
        suffixIndex += 1 #increment the index of the suffix
        size = size/1024.0 #apply the division
    return "%.*f %s"%(precision,size,suffixes[suffixIndex])

print convertSize(rasSize)

The conversion part of the script was found here​. The os.getsize function only returns the size of the file not the size on disk which incorporates the allocated size on the hard drive for the file metadata as seen in the properties of a file.

curtvprice
MVP Esteemed Contributor

Here's a function implementation that returns a number in bytes and has the option of returning something similar to uncompressedSize. (There is a bug BUG-000110272 with the uncompressedSize property of the Raster object with large rasters that this works around.)

def raster_size(filepath, size_type="DISK"):
    """Return size of file-based rasters

    filepath
      path to raster

    size_type
      "DISK" - file size on disk from os.path.getsize() on files (default)
      "UNCOMPRESSED" - file size based on rows x columns x 4 bytes
                       (estimate for Esri grid format, 32 bits/gridcell)
      "CELLS" - number of gridcells

    returns: file size as long integer

    """
    import os
    filepath = os.path.realpath(filepath)
    if not os.path.exists(filepath):
        raise Exception("{} not found".format(filepath))
    skey = str(size_type)[:3].upper()
    if skey not in ["UNC", "CEL"]:
        raster = filepath
        basename = os.path.basename(raster).split(".")[0]
        rootFolder = os.path.dirname(raster)
        associatedFiles = [os.path.join(rootFolder,f)
                           for f in next(os.walk(os.path.dirname(raster)))[2]
                           if f.split(".")[0] == basename]

        if len(os.path.basename(raster).split(".")) == 1:
            # .tif, .jpg etc
            fileList = next(os.walk(raster))[2]
            dirSize = sum([os.path.getsize(os.path.join(raster,f)) for f in fileList])
            rasSize = sum([os.path.getsize(f) for f in associatedFiles]) + dirSize
        else:
            # Esri grid format
            rasSize = sum([os.path.getsize(f) for f in associatedFiles])
    else:
        from arcpy.sa import Raster
        r = Raster(filepath)
        if skey == "UNC":
            rasSize = r.width * r.height * 4 * r.bandCount
        elif skey == "CEL":
            rasSize = r.width * r.height
    return rasSize‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
FatihDur1
Occasional Contributor

I reformatted the code snip.

def raster_size(filepath, size_type="DISK"):
    """Return size of file-based rasters
    filepath    path to raster
    size_type   "DISK" - file size on disk from os.path.getsize() on files (default)
                "UNCOMPRESSED" - file size based on rows x columns x 4 bytes
                (estimate for Esri grid format, 32 bits/gridcell)
                "CELLS" - number of gridcells
    returns: file size as long integer    """
    import os
    filepath = os.path.realpath(filepath)
    if not os.path.exists(filepath):
        raise Exception("{} not found".format(filepath))
    skey = str(size_type)[:3].upper()
    if skey not in ["UNC", "CEL"]:
        raster = filepath
        basename = os.path.basename(raster).split(".")[0]
        rootFolder = os.path.dirname(raster)
        associatedFiles = [os.path.join(rootFolder,f) for f in next(os.walk(os.path.dirname(raster)))[2] if f.split(".")[0] == basename]
        if len(os.path.basename(raster).split(".")) == 1:
            # .tif, .jpg etc
            fileList = next(os.walk(raster))[2]
            dirSize = sum([os.path.getsize(os.path.join(raster,f)) for f in fileList])
            rasSize = sum([os.path.getsize(f) for f in associatedFiles]) + dirSize
        else:
            # Esri grid format
            rasSize = sum([os.path.getsize(f) for f in associatedFiles])
    else:
        from arcpy.sa import Raster
        r = Raster(filepath)
        if skey == "UNC":
            rasSize = r.width * r.height * 4 * r.bandCount
        elif skey == "CEL":
            rasSize = r.width * r.height
    return rasSize‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos