Select to view content in your preferred language

Calculating the Mode from list of fields using field calculator

1268
5
Jump to solution
07-20-2023 12:22 PM
adam_gallaher
New Contributor II

I am looking to calculate the mode for a new field from a list of fields using the field calculator. I cannot seem to find a method that works. I am open to pulling the data down and performing the calculations in python if field calculator is not capable. Thanks in advance for any help or suggestions. 

Adam

2 Solutions

Accepted Solutions
JohannesLindner
MVP Frequent Contributor

Field Calculator, language = Python:

# ModeField = 
get_mode(!Field1!, !Field2!, !Field3!, !Field4!)

# code block
import statistics
def get_mode(*args):
    return statistics.mode(args)

Have a great day!
Johannes

View solution in original post

JohannesLindner
MVP Frequent Contributor

unhashable type: 'list'.

You get this error because

  • I used the *args declaration in get_mode. The asterisk is the unpacking operator. It is used here to allow any number of arguments and work with them as tuple inside the function.
  • You are giving the fields as list, not as separate arguments.
  • That means that the function calls statistics.mode(  ( [1, 2, 3], )  ) , not statistics.mode(  [1, 2, 3]  ), it inputs the list as one complete value to check.
  • statistics.mode uses a hashmap/dictionary internally to count the elements
  • So it tries to use the list as key for that hashmap. But lists are mutable, so they can't be hashed, thus the error.

 

 So the first step to fixing this is fixing the code block (just remove the asterisk):

codeblock = """
def Get_mode(args):
    return statistics.mode(args)
"""

You could also skip the code block altogether and just call statistics.mode directly in the expression field.

 

The next step is to fix your input list. Right now you're inputting string values. Assuming you have fields like "R1", "R2" and so on, you're doing this:

field_names = ["R1", "R2", "R3", "R4"]
statistics.mode(field_names)
# "R1"

 

You need to input the actual values. In the field calculator, you do this by enclosing the field name in exclamation points:

field_names = ["!R1!", "!R2!", "!R3!", "!R4!"]

 

But that still won't work, because that will input this literal list and just return "!R1!" for each row.

Instead, you need to completely construct the whole expression instead of relying on an outside variable:

# get the field names
field_names = [f.name for f in arcpy.ListFields(inTable, "R*")]  # ['R1', 'R2', 'R3']
# surround them in exclamation marks
field_names = [f"!{n}!" for n in field_names]  # ['!R1!', '!R2!', '!R3!']
# concatenate
field_names = ", ".join(field_names)  # '!R1!, !R2!, !R3!'
# get the function call as string
expression = f"Get_mode([{field_names}])"  # 'Get_mode([!R1!, !R2!, !R3!])'

# alternatively: skip the code block and call statistics.mode directly:
expression = f"statistics.mode([{field_names}])"  # 'statistics.mode([!R1!, !R2!, !R3!])'


# and finally execute the FIeld Calculator
arcpy.management.CalculateField(inTable, fieldName2, expression, "PYTHON3", codeblock)

 

 

Personally, I don't like calling Calculate Field from a script. It's a tool for quick manual calculations, not for automating tasks. In your case, I'd just do it with an UpdateCursor:

import arcpy
import statistics

arcpy.env.workspace = "my path"
mode_field = "Most_Freq"
for fc in arcpy.ListFeatureClasses():
    arcpy.management.AddField(fc, mode_field, "LONG")
    value_fields = [f.name for f in arcpy.ListFields(fc, "R*")]
    with arcpy.da.UpdateCursor(fc, [mode_field] + value_fields) as cursor:
        for row in cursor:
            row[0] = statistics.mode(row[1:])
            cursor.updateRow(row)

 


Have a great day!
Johannes

View solution in original post

5 Replies
JohannesLindner
MVP Frequent Contributor

Field Calculator, language = Python:

# ModeField = 
get_mode(!Field1!, !Field2!, !Field3!, !Field4!)

# code block
import statistics
def get_mode(*args):
    return statistics.mode(args)

Have a great day!
Johannes
adam_gallaher
New Contributor II

Hello Johannes, 

Thank you for your solution. I am now looking for a solution that is flexible to the field name input in the get_mode expression. What I have does not work but you should be able to see what I am trying to do. I am getting an error, unhashable type: 'list'. Is there a way to access the field names using a wildcard in the expression? Thank you in advance for your help.

Adam

 

import arcpy
from arcpy import env
import statistics

arcpy.env.parallelProcessingFactor = "100%"
arcpy.env.overwriteOutput = "True"

env.workspace = "my path"

fieldName2 = "Most_Freq"
expression = "Get_mode(field_names)"
codeblock = """

def Get_mode(*args):
    return statistics.mode(args)"""

featureclasses = arcpy.ListFeatureClasses() 
for fc in featureclasses: 
    inTable = fc 
    print("Working on: {}".format(fc))
    field_names = [f.name for f in arcpy.ListFields(fc, "R*")]
    
    arcpy.management.AddField(inTable, fieldName2, "LONG")
    arcpy.management.CalculateField(inTable, fieldName, expression, "PYTHON3", codeblock)

 

0 Kudos
JohannesLindner
MVP Frequent Contributor

unhashable type: 'list'.

You get this error because

  • I used the *args declaration in get_mode. The asterisk is the unpacking operator. It is used here to allow any number of arguments and work with them as tuple inside the function.
  • You are giving the fields as list, not as separate arguments.
  • That means that the function calls statistics.mode(  ( [1, 2, 3], )  ) , not statistics.mode(  [1, 2, 3]  ), it inputs the list as one complete value to check.
  • statistics.mode uses a hashmap/dictionary internally to count the elements
  • So it tries to use the list as key for that hashmap. But lists are mutable, so they can't be hashed, thus the error.

 

 So the first step to fixing this is fixing the code block (just remove the asterisk):

codeblock = """
def Get_mode(args):
    return statistics.mode(args)
"""

You could also skip the code block altogether and just call statistics.mode directly in the expression field.

 

The next step is to fix your input list. Right now you're inputting string values. Assuming you have fields like "R1", "R2" and so on, you're doing this:

field_names = ["R1", "R2", "R3", "R4"]
statistics.mode(field_names)
# "R1"

 

You need to input the actual values. In the field calculator, you do this by enclosing the field name in exclamation points:

field_names = ["!R1!", "!R2!", "!R3!", "!R4!"]

 

But that still won't work, because that will input this literal list and just return "!R1!" for each row.

Instead, you need to completely construct the whole expression instead of relying on an outside variable:

# get the field names
field_names = [f.name for f in arcpy.ListFields(inTable, "R*")]  # ['R1', 'R2', 'R3']
# surround them in exclamation marks
field_names = [f"!{n}!" for n in field_names]  # ['!R1!', '!R2!', '!R3!']
# concatenate
field_names = ", ".join(field_names)  # '!R1!, !R2!, !R3!'
# get the function call as string
expression = f"Get_mode([{field_names}])"  # 'Get_mode([!R1!, !R2!, !R3!])'

# alternatively: skip the code block and call statistics.mode directly:
expression = f"statistics.mode([{field_names}])"  # 'statistics.mode([!R1!, !R2!, !R3!])'


# and finally execute the FIeld Calculator
arcpy.management.CalculateField(inTable, fieldName2, expression, "PYTHON3", codeblock)

 

 

Personally, I don't like calling Calculate Field from a script. It's a tool for quick manual calculations, not for automating tasks. In your case, I'd just do it with an UpdateCursor:

import arcpy
import statistics

arcpy.env.workspace = "my path"
mode_field = "Most_Freq"
for fc in arcpy.ListFeatureClasses():
    arcpy.management.AddField(fc, mode_field, "LONG")
    value_fields = [f.name for f in arcpy.ListFields(fc, "R*")]
    with arcpy.da.UpdateCursor(fc, [mode_field] + value_fields) as cursor:
        for row in cursor:
            row[0] = statistics.mode(row[1:])
            cursor.updateRow(row)

 


Have a great day!
Johannes
adam_gallaher
New Contributor II

Thank you for the explanation and solution. I'd agree, going the update cursor route is probably the more elegant solution. I found this to work as well. However, it might be better to update my code for future reference. 

Thank you again.

Adam 

import arcpy
from arcpy import env
import statistics
arcpy.env.parallelProcessingFactor = "100%"
arcpy.env.overwriteOutput = "True"
env.workspace = "my path"

fieldName2 = "Most_Freq"
expression = "Get_mode(', '.join(field_names))"
codeblock = """

def Get_mode(*args):
    return statistics.mode(args)"""

featureclasses = arcpy.ListFeatureClasses() 
for fc in featureclasses: 
    inTable = fc 
    print("Working on: {}".format(fc))
    field_names = ["!" + f.name + "!" for f in arcpy.ListFields(fc, "R*")]
   
    arcpy.management.AddField(inTable, fieldName2, "LONG")
    arcpy.management.CalculateField(inTable, fieldName, expression, "PYTHON3", codeblock)
0 Kudos
jcarlson
MVP Esteemed Contributor

There are a number of ways you could do it, like creating an array of distinct values, then checking which show up the most frequently. It's a lot of code for something simple, but here it is:

var fields_list = [
  'a',
  'b',
  'c'
]

var unique_values = []

for (var f in fields_list) {
  var the_val = $feature[fields_list[f]]

  // add unique values to array; skip if already in
  if (Includes(unique_values, the_val)) {
    continue
  } else {
    Push(unique_values, the_val)
  }
}

var modes = []

for (var u in unique_values) {
  var u_count = 0

  // if field has the unique value, increment it
  for (var f in fields_list) {
    if ($feature[fields_list[f]] == unique_values[u]) {
      u_count += 1
    }
  }

  /*
    if object in `modes` is smaller, replace it
    if equal, add to array
    otherwise skip
  */
  if (Count(modes) == 0 || First(modes)['count'] < u_count) {
    modes = [{'val': unique_values[u], 'count': u_count}]
  } else if (First(modes)['count'] == u_count) {
    Push(modes, {'val': unique_values[u], 'count': u_count})
  }
}

// our `modes` array should have 1 or more items in it now, which we can use to create an output
var out_lines = []

for (var m in modes) {
  Push(out_lines, modes[m]['val'])
}

return Concatenate(out_lines, '\n')

 

And here it is run against a made up feature:

jcarlson_0-1689884533220.png

And if I make it so multiple values tie, I get all of the modes:

jcarlson_1-1689884605145.pngjcarlson_2-1689884632998.png

 

- Josh Carlson
Kendall County GIS