Zonal Histogram Bins

JonathanMartin1 · ‎05-31-2013

Hello all,

I am new to programming with arcpy, so this question may be basic.

I have a raster with positive and negative values. I would like to create a histogram for the values inside a polygon. I have used the Zonal Histogram tool in the spatial analyst tools (arcpy.sa), and I obtain an output table with 256 bins. However, I do not know the ranges associated with the bins. The bins are only given by OBJECTID numbering 1 to 256. How do I determine the ranges of the bins using python?

I know that if I change the symbology of the raster to classified, I will get specified bin ranges given by the column "Label". However, this seems to limit me to 32 bins. Also, I would like to be able to do everything in python without having to change the symbology in arcmap.

Thanks in advance for the help!

ShaunWalbridge · ‎05-31-2013

For a given raster, you can determine the range of values by pulling out the properties from an arcpy.Raster object (code as a gist😞

import arcpy
arcpy.CheckOutExtension("spatial")
raster = arcpy.Raster("c:\\workspace\\input_raster")
range = raster.maximum - raster.minimum
bins = 256
bin_size = (range) / bins

# now we know the bin width, can multiply that by the OBJECTID for the lower bound of each bin
lower_bin_value = [bin_size * i for i in range(bins)]

You could add this to a table with a CalculateField operation, or otherwise embed this information back in the output table. The only trick I didn't cover here is that you'll need to clip your raster data to the polygon to get your 'input_raster' in this approach, since it doesn't know about the geometry restriction automatically.

JonathanMartin1 · ‎06-03-2013

Thanks for the help sawlbridge.

It had occurred to me that the bins would be equally divided by the range of values used. In python, I found the range of values from the zonal statistics tool, and I computed the histogram values. I compared the resulting zonal histograms to the histograms I obtain from using the zonal histogram tool in arcmap with the symbology set to classified. However the results were quite different. I copied the results and graphed them in Excel. I have attached an image of the graphs. As you can see, the histograms created in arcmap are roughly centered about 0, but the ones computed in python are centered about -5. Any initial thoughts, or should I post some code?

Thanks

[ATTACH=CONFIG]24941[/ATTACH]

curtvprice · ‎06-03-2013

How do I determine the ranges of the bins using python?

The easiest way to do this without ArcMap would be to run is to run Zonal Statistics As Table, using your binned output as the zone raster, and your original raster as the value raster. this will give you stats for each bin.

You may want to look into the Slice tool as well.

As for your calculations not exactly matching to ArcMap symbology, the renderers use approximations of the histograms; they are designed for display and not quantitative analysis. To get true histogram values you would want to use other tools.

JonathanMartin1 · ‎06-03-2013

I looked further at the output of the zonal statistics tool for my 2 zones. According to the output statistics table, the means and standard deviations of the two zone are both 1 and 0, respectively. Obviously, this is not correct. Could there be some issues with the noData parts?

Here is the code for the function I wrote:

import arcpy, os, random
from arcpy import env, mapping
from arcpy.sa import *

def calcZonalHistograms(polygonFile, zoneField, inRaster, outDir = None, outTbls=None, cleanup=1):
    if outTbls:
        outHistTbl, outStatTbl = outTbls
    else:
        outHistTbl = os.path.join(os.path.dirname(inRaster), os.path.basename(inRaster).split(".")[0]+"_zoneHist_tbl.dbf")
        outStatTbl = os.path.join(os.path.dirname(inRaster), os.path.basename(inRaster).split(".")[0]+"_zoneStat_tbl.dbf")

    # Check out the ArcGIS Spatial Analyst extension license
    arcpy.CheckOutExtension("Spatial")

    # Find frequencies for histogram.  This will use 256 bins of equal intervals spanning the range of values in the area. (?)
    ZonalHistogram(polygonFile, zoneField, inRaster, outHistTbl)

    # Execute ZonalStatisticsAsTable
    outZSaT = ZonalStatisticsAsTable(polygonFile, zoneField, inRaster, outStatTbl, "NODATA")

    # Find bin sizes and values
    # Put histogram frequencies in a dictionary; key word equal to field name (Poly_Area)
    fieldList = arcpy.ListFields(outHistTbl)
    info_dict = {}
    binSizes = {}
    for field in fieldList:
        if field.name == "OID": pass
        else:
            info_dict[field.name] = []
            
    rows = arcpy.SearchCursor(outHistTbl)
    i = 0
    for row in rows:
        for field in fieldList:
            if field.name != "OID": info_dict[field.name].append(row.getValue(field.name))
        i+=1
    # Find bin Size from outStatTbl
    rows = arcpy.SearchCursor(outStatTbl)
    for row in rows:
        binSizes[row.getValue("POLY_AREA")] = [row.getValue("MIN"), (float(row.getValue("MAX"))-float(row.getValue("MIN")))/i]

    # Export to txt for use in other programs (excel, etc...)
    if not outDir: outDir = os.path.dirname(inRaster)
    for key in info_dict.keys():
        exportHistogramToTxt(binSizes[key][1], binSizes[key][0], info_dict[key], key, os.path.join(outDir, key+"_hist.txt"))

Again, thanks for the help!

ShaunWalbridge · ‎06-03-2013

Interesting. I'm not entirely familiar with this tool, but am happy to help dig in further. Could you send me some sample data at swalbridge@esri.com? I can accept up to 20MB attachments. Failing that, I can set up a dropsite for your data, if possible it'd be great to have the actual data so I can compare directly with the analysis you've posted above.

It may also be useful to try using the RasterToNumPyArray conversion on the data, provided it can fit into memory -- then you can write a simple aggregation function based on the raw values, and work with the data as needed, computing your own histogram, or using matplotlib to compute the histogram directly from your values. As you mentioned, how NoData is handled is important, as is what happens with the edge pixels that intersect the geometry and the raster -- implementations vary on how they handle these edge pixels.

cheers,
Shaun