Select to view content in your preferred language

Zonal statistics - percentiles

31397
65
Jump to solution
06-30-2014 01:35 PM
toddsams
Deactivated User
Hello,

I am looking for an analogous method to Zonal Statistics that is able to generate user-defined percentiles (such as the 5th and 95th percentile) of the input raster data.

I looked at the GME tools and those stats metrics are hardwired in the same fashion as ArcGIS Zonal Statistics tools are.

Any other ideas that I can explore?
65 Replies
XanderBakker
Esri Esteemed Contributor

Hi rafaellobergonse ,

Thanks for reaching out to me. I am sorry to hear that you are not obtaining the correct results. I will try to find some time to have a look at the data you attached. I also want to mention that in ArcGIS Pro 2.5 (to be released next month) there are a couple of enhancements in the zonal statistics tools providing support for overlapping polygons. I also heard there is going to be support for percentiles, but I don't remember if this is for the zonal statistics tool too. 

XanderBakker
Esri Esteemed Contributor

Hi Rafaello Bergonse ,

Strange, I just ran the code in ArcGIS Pro (see below) without converting the data into a fgdb and it produced a result that seems to be correct without the same value for each polygon:

This is the code I used and I will attach the shapefile so you can validate the results:

import arcpy


def main():
    # settings
    ras = r"D:\GeoNet\ZonalStats\Sample\raster"
    fc = r"D:\GeoNet\ZonalStats\Sample\Pols.shp"
    # ras = r"C:\GeoNet\ZonalStatsSmall\test20150910\i2015080723001.tif"
    # fc = r"C:\GeoNet\ZonalStatsSmall\test20150910\testshp67sp.shp"
    lst_perc = [2, 5, 10, 25, 50, 75, 80, 90, 95, 98]  # list of percentiles to be calculated
    fld_prefix = "Perc_"
    start_at = 0

    arcpy.env.workspace = r"C:\Users\xande\AppData\Local\Temp\ArcGISProTemp15404\c2060813-9369-4fd7-a34e-177099387aae\Default.gdb"
    arcpy.env.snapRaster = ras # r"D:\Extractor\ras.gdb\DEM"
    arcpy.env.overwriteOutput = True

    # add fields if they do not exist in fc
    # be aware, fields that exist will be overwritten!
    arcpy.AddMessage("filling field list and adding fields")
    flds = []
    for perc in lst_perc:
        fld_perc = "{0}{1}".format(fld_prefix, perc)
        flds.append(fld_perc)
        if not FieldExist(fc, fld_perc):
            arcpy.AddField_management(fc, fld_perc, "LONG")
    arcpy.AddMessage("flds={0}".format(flds))

    # Enable Spatial analyst
    arcpy.CheckOutExtension("Spatial")

    # get cell area
    desc = arcpy.Describe(ras)
    cell_area = desc.meanCellHeight * desc.meanCellWidth

    # loop through polygons
    arcpy.AddMessage("loop through polygons")
    flds.append("SHAPE@")
    i = 0
    with arcpy.da.UpdateCursor(fc, flds) as curs:
        for row in curs:
            i += 1
            arcpy.AddMessage("Processing polygon: {0}".format(i))
            if i >= start_at:
                polygon = row[flds.index("SHAPE@")]

                if polygon.area < cell_area:
                    polygon = polygon.buffer((desc.meanCellHeight + desc.meanCellWidth) / 4)

                    # Execute ExtractByPolygon (you can't send the polygon object)
                arcpy.AddMessage(" - ExtractByMask...")

                # arcpy.sa.ExtractByPolygon(ras, lst_parts, "INSIDE")
                ras_pol = arcpy.sa.ExtractByMask(ras, polygon)
                del polygon
                outname = "ras{0}".format(i)
                ras_pol.save(outname)
                arcpy.AddMessage(" - saved raster as {0}".format(outname))

                # create dictionary with value vs count
                arcpy.AddMessage(" - fill dict with Value x Count")
                flds_ras = ("VALUE", "COUNT")
                dct = {row[0]: row[1] for row in arcpy.da.SearchCursor(outname, flds_ras)}
                # del ras_pol

                # calculate number of pixels in raster
                arcpy.AddMessage(" - determine sum")
                cnt_sum = sum(dct.values())
                arcpy.AddMessage(" - sum={0}".format(cnt_sum))

                # loop through dictionary and create new dictionary with val vs percentile
                arcpy.AddMessage(" - create percentile dict")
                dct_per = {}
                cnt_i = 0
                for val in sorted(dct.keys()):
                    cnt_i += dct[val]
                    dct_per[val] = cnt_i / cnt_sum
                del dct

                # loop through list of percentiles
                arcpy.AddMessage(" - iterate perceniles")
                for perc in lst_perc:
                    # use dct_per to determine percentiles
                    perc_dec = float(perc) / 100
                    arcpy.AddMessage("  - Perc_dec for is {0}".format(perc_dec))
                    pixval = GetPixelValueForPercentile(dct_per, perc_dec)
                    arcpy.AddMessage("  - Perc for {0}% is {1}".format(perc, pixval))

                    # write pixel value to percentile field
                    arcpy.AddMessage("  - Store value")
                    fld_perc = "{0}{1}".format(fld_prefix, perc)
                    row[flds.index(fld_perc)] = pixval
                del dct_per

                # update row
                arcpy.AddMessage(" - update row")
                curs.updateRow(row)

                # return SA license
    #arcpy.CheckInExtension("Spatial")


def GetPixelValueForPercentile(dctper, percentile):
    """will return last pixel value
      where percentile LE searched percentile."""
    pix_val = -1
    try:
        pix_val = sorted(dctper.keys())[0]  # initially assign lowest pixel value
        for k in sorted(dctper.keys()):
            perc = dctper[k]
            if perc <= percentile:
                pix_val = k
            else:
                break
    except Exception as e:
        pass
    return pix_val


def FieldExist(featureclass, fieldname):
    """Check if field exists"""
    fieldList = arcpy.ListFields(featureclass, fieldname)
    return len(fieldList) == 1


if __name__ == '__main__':
    main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
RafaelloBergonse
Emerging Contributor

Hi Xander Bakker, 

Thanks a lot for your patience and for the quick response!

I tried the latest code you sent, but I get the same results. I am enclosing the shapefile as an annex so you can see what I get. This is the result of the latest code running over the shapefile you sent (so your correct values got overwitten).

It may be noteworthy that the result I get for each polygon is invariably a very high value. For example, I get 41 for the first polygon in the shapefile, and 53 for the second one. When I compare these values with the rasters that are generated by the script while running the Extract by Mask tool, I see that there is only 1 pixel in the first raster with a value higher than 41, and only 2 pixels in the second raster with values higher than 53. 

There is evidently nothing wrong with the code, so the problem must be related to my own machine or to some configuration issue. I am running ArcGIS desktop 10.7.1.

I should also make clear that I cannot program in Python. The only things that I have changed in your code are the inicial statements describing the input raster and feature class, as well as the snap raster and the workspace. I believe the rest is the same independently of the inputs.

                                                                                      Best regards,

                                                                                  Rafaello Bergonse

0 Kudos
XanderBakker
Esri Esteemed Contributor

Hi rafaellobergonse 

I think I figgered it out. Between lines 70 and 80 you will see this block of code. 

                # loop through dictionary and create new dictionary with val vs percentile
                arcpy.AddMessage(" - create percentile dict")
                dct_per = {}
                cnt_i = 0
                for val in sorted(dct.keys()):
                    cnt_i += dct[val]
                    dct_per[val] = float(cnt_i) / float(cnt_sum)
                del dct

I change line 77 (7 here,) including the float statements twice in the calculation. I think that is what went wrong. Please try again after changing than line.

I hope it works. Happy Xmas!

RafaelloBergonse
Emerging Contributor

Hi Xander Bakker,

You DID figure it out! It worked perfectly, as you can see by the image below. 

Thanks a lot for you help! From now on, I'll use your script whenever I need to calculate percentiles.

I wish you a great Christmas!

                                                            Best regards,

                                                          Rafaello Bergonse

XanderBakker
Esri Esteemed Contributor

Hi rafaellobergonse , 

I'm glad to hear it is working. I guess that Python 3 (ArcGIS Pro) does the calculation differently and Python 2,7 requieres the values to be floating in order to have a correct result. Merry Christmas!