Hello,
I'm stumped by a problem that I think should have a fairly straightforward answer...
I have a few hundred features with a number of attributes, all numerical. Some features have all attributes filled in, but some have a handful of null values. Is there any way of calculating the mean value for each feature that takes into account these null values? For instance, it calculates how many attributes are not null for each feature and uses that to generate the mean?
Would really appreciate any help on this!
Thanks
what is the source type of your data? Featureclass tables? dbase? excel?
And I assume you want to automate the whole process rather than doing the process table by table and field by field.
You could use numpy, and convert the tables to numpy arrays and do the columns all at once.
#..... stuff before includes FeatureclassToNumPyArray or TabletoNumPyArray
# to get the table into an array... Then use nanmean to the mean
b
Out[6]:
array([[ nan, 5., nan, nan, 5.],
[ nan, nan, 8., 6., nan],
[ 5., nan, 7., 4., 7.],
[ nan, 9., nan, 9., nan],
[ nan, nan, 4., 9., 6.],
[ nan, 4., nan, 6., 5.]])
np.nanmean(b, axis=1)
Out[7]: array([ 5. , 7. , 5.75 , 9. , 6.33333333, 5. ])
Then you could cycle through the tables and get the mean
PS 'nan' is the numpy equivalent of None in python and nodata in tables.
Thanks Dan, I'm not sure I fully follow you. The data source is a featureclass attribute table and ideally I'd be looking to use the Field Calculator or a python script to find the means. Unfortunately, I really don't follow the code block you've copied in, can you suggest how I might make use of it? I had thought numpy might be the answer.
Fairly simple database manipulation; you want the mean that includes the number records where field-of-choice is null, right? Try this: right click on the field name and select Statistics. That will give you mean. Select where field-of-choice is null. Calculate the field-of-choice value = mean from the Statistics results.
With all due respect to Dan and his elegant solution, sometimes going with the basics can be useful too....
Hi Joe, I'm looking for statistics to be done on a field array, not only one field. So each feature has ten fields that may or may not be null... I want a script that figures out how many are null to then sum and correctly produce the mean (e.g. if only 8 of the 10 fields are scored, the sum of those will be divided by 8). If you're answer addresses this, please could you rephrase it as I don't follow it.
I had thought of that of course.. but I think Luke has multiple fields in a table, and there is a need to do this all at once, rather than one field at a time, hence, the all fields all at once solution, which can then be wrapped in an all files, for all fields all at once solution.
Of course, the problem could be simpler than assumed
Yep... I get it.. But there are so many "it's gotta be automated, but I can't describe it" types of questions here....
I am trying the following but still not having any success:
def mean(Welcoming, Access, Community, Safe, Provision, Quality, Security, Dog, Litter, Grounds):
fieldList = [Welcoming, Access, Community, Safe, Provision, Quality, Security, Dog, Litter, Grounds]
validList = []
for i in fieldList:
if i != None:
validList.append(i)
meanVal = sum(validList)/len(validList)
return meanVal
I get an error that 9 required positional arguments are missing. Any suggestions please?
[edited to use Python syntax highlighting, with proper indent - VA]
How are you calling your function from the field calculator? Obviously the function definition is going into the code block, but what are you putting into the expression box. The error message makes me think you are calling your function incorrectly.
Hi Joshua, I'm putting this in:
mean([ !Welcoming!, !Access!, !Community!, !Safe!, !Provision!, !Quality!, !Security!, !Dog!, !Litter!, !Grounds! ])