Get a count of field values

3616
11
01-08-2020 08:39 AM
JoeBorgione
MVP Emeritus

I'm working with some data that is a ridiculously large table (184 fields) and I would like to asses how many instances of of each field are non-null.  In other words if the values are more more often null, I'll drop the field(s).  For now I've written a script that performs an iterative selection for each field, and as you can imagine with 184 fields and a few thousand records it's pretty slow.  I tried a couple other approaches that failed, but I have to think there is a better way to get a count of records for which a given field is populated.  Here is what I've done to date:  

import arcpy

table = r'J:\some\path\to\file.gdb\tableName'
fields = []

for f in arcpy.ListFields(table):
    fields.append(f.name)
 
arcpy.MakeTableView_management(table,'tv')

for f in fields:
    select = f'{f} is not null'
    arcpy.SelectLayerByAttribute_management('tv','NEW_SELECTION',select)
    c = arcpy.GetCount_management('tv')
    
    print('Field {} has {} non-null records'.format(f,c[0]))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
That should just about do it....
Tags (1)
11 Replies
DanPatterson_Retired
MVP Emeritus

numpy comes with pro... in fact arcgis pro requires numpy, so does the arcgis module, and pandas scipy etc etc

this can be done with or without a clone.

Of course, as in all modules, you have to import numpy which is usually do as

import numpy as np

most python IDEs allow you to set default imports if you feel lazy or are forgetful.

wwnde
by
Occasional Contributor

@hziegler-esristaff way is what I have always used. ArcPro 2.15 now allows toggling between desktop and spatially enabled dataframes. Additionally from the python API you can now launch, code within spatially referenced dataframes and save and share as an item in ArcOnline. Apart from using numpy, can also use pandas.

import pandas as pd

df = pd.read_csv(r'file directory')

df.isna().sum() # for entire dataframe

df.fieldname.isna()

The advantage with spatially dataframes is it levarages numpy and pandas to access statistical abilities that can otherwise be accessed 'cumbersomely' in arcpy. It also accords excellent visualization in matplotlib and seaborn python libraries. 

0 Kudos