Get Statistics From All Fields in an Attribute Table

788
0
02-13-2018 06:53 AM

Get Statistics From All Fields in an Attribute Table

Recently a question made me think of how to calculate statistics of all fields (all attributes) for each record of an attribute table. See thread here:how to add up all the fields in an attribute table 

The OP actually wanted to get the summary statistics of a field (not sum all fields of a record), but I thought I would share a snippet of how you can accomplish this. See the code below:

def main():
    import arcpy

    # input featureclass and output field
    fc = r'C:\GeoNet\SumAllFields\data.gdb\test_pol'
    fld_result = 'AvgDate'

    # settings to play with
    wild_card = '*DATE' #'FLD*'
    stats_type = 'mean' # min, max, sum, mean, count, std
    fld_types = ['Date', 'Double', 'Integer', 'Single', 'SmallInteger']
    include_system_fields = False
    ignore_nodata = True

    # get field names
    fld_names = GetFieldNames(fc, wild_card, fld_types, fld_result, include_system_fields)

    # add the field if it does not exist
    if len(arcpy.ListFields(fc, fld_result)) == 0:
        arcpy.AddField_management(fc, fld_result, "DOUBLE")

    # update cursor
    flds = [fld_result]
    flds.extend(fld_names)
    with arcpy.da.UpdateCursor(fc, flds) as curs:
        for row in curs:
            lst = list(row)[1:]
            row[0] = GetStats(lst, stats_type, ignore_nodata)
            curs.updateRow(row)


def GetFieldNames(fc, wild_card, fld_types, fld_result, include_system_fields):
    # exclude result field
    exclude_fields = [fld_result]

    # create initial list of fields to include
    fld_names = sorted([fld.name for fld_type in fld_types for fld in arcpy.ListFields(fc, wild_card, fld_type)])

    # exclude system fields
    if include_system_fields == False:
        try:
            exclude_fields.append(arcpy.Describe(fc).AreaFieldName)
        except:
            pass
        try:
            exclude_fields.append(arcpy.Describe(fc).LengthFieldName)
        except:
            pass
    for exclude_field in exclude_fields:
        if exclude_field in fld_names:
            fld_names.remove(exclude_field)

    return fld_names


def GetStats(lst, stats_type, ignore_nodata):
    import numpy

    stats = None
    if ignore_nodata:
        lst = [a for a in lst if not a is None]

    try:
        if len(lst) > 0:
            if stats_type == 'sum':
                stats = sum(lst)
            elif stats_type == 'min':
                stats = min(lst)
            elif stats_type == 'max':
                stats = max(lst)
            elif stats_type == 'mean':
                if type(lst[0]) is datetime:
                    mean = (numpy.array(lst, dtype='datetime64')
                            .view('i8')
                            .mean()
                            .astype('datetime64'))
                    ts = (mean - numpy.datetime64('1970-01-01T00:00:00Z')) / numpy.timedelta64(1, 's')
                    stats = datetime.utcfromtimestamp(ts)
                else:
                    stats = sum(lst) / float(len(lst))
            elif stats_type == 'count':
                stats = len(lst)
            elif stats_type == 'std':
                arr = numpy.array(lst)
                stats = numpy.std(arr, axis=0)
            else:
                print 'Error: Unknown stats_type'
    except Exception as e:
        print 'Error:', e
        print ' - lst:', lst
        print ' - ignore_nodata:', ignore_nodata
    finally:
        return stats

if __name__ == '__main__':
    main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

To play with this code, change:

  • on line 5, the path to the featureclass or table
  • on line 6 the output field name that will hold the result
  • on line 9 you can specify a wild card for the fields to include: The example "*DATE" will consider all fields that end with the text DATE. You can also leave it as "*" to include all fields
  • on line 10 you define the statistics type. Valid values are: min, max, sum, mean, count and std
  • on line 11 you define a list of field types to include in the operation
  • on line 12 you can toggle between include double fields like shape length and area or to exclude them
  • on line 13 the ignore_nodata var allows you to ignore NULL/None values, they will be removed from the list of values used to determine the statistics. 

Have fun...

https://community.esri.com/community/developers/gis-developers/python?sr=search&searchId=7cf367a1-89...

Version history
Last update:
‎12-12-2021 03:36 AM
Updated by: