Select to view content in your preferred language

Extract the largest Number from a String

2048
4
06-18-2020 09:08 AM
NataliaGutierrez1
Regular Contributor

Hello,

I am trying to extract the largest number from a string. 

More details:

I have a feature class with a text field. Each feature has a string like this: "3, 5, 3.9, 2.5 FAR Values" and I need to extract the higher number and put it in a new field. From that string I would need number 5.

There are some features that have Null values and some with just text and no numbers.

I wrote the following script using a python function from the internet but I am not sure how to apply it in Arcpy.

import arcpy

# find largest number in a string

arcpy.env.workspace = r"D:\APRX_MXDS\USA_App_Project\usa_parcels_with_FARField.gdb"
arcpy.env.overwriteOutput = True

fc = "temp"

with arcpy.da.UpdateCursor(fc, "FAR_INTEGER") as cursor: # Loop through each feature
    for row in cursor:
        ls = list()
        for w in row[0].split():
            try:
                ls.append(int(w))
            except:
                pass
            try:
                return max(ls)
            except:
                return None
Tags (1)
0 Kudos
4 Replies
JoshuaBixby
MVP Esteemed Contributor

For extracting numbers from text, you are going to want to use regular expressions instead of Python string split, unless your text strings are highly structured and simple.  I would just err on the side of using re — Regular expression operations — Python 3.8.3 documentation .

Assuming you created a new field "MAX_VALUE" to hold the maximum value, the following code should work for you:

import arcpy
import re

# find largest number in a string

arcpy.env.workspace = r"D:\APRX_MXDS\USA_App_Project\usa_parcels_with_FARField.gdb"
arcpy.env.overwriteOutput = True

fc = "temp"

with arcpy.da.UpdateCursor(fc, ["FAR_INTEGER", "MAX_VALUE"]) as cursor: # Loop through each feature
    for row in cursor:
        if row[0] is None:  continue
        nbrs = [float(i) for i in re.findall('(\d\.?\d*|\.\d*)', row[0])]
        if nbrs:
            row[1] = max(nbrs)
        cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

UPDATE:  Made a change to code to account for strings that don't have any numbers.

WillHouston
Regular Contributor

Trying testing your regex:
"3, 5, 3.9, 12.5 FAR Values"

"3, 5, 3.9, 2.5 FAR Values."

 

This is a good case for using regex and a much better solution than the one I provided, but using regex does not mean you shouldn't check for exceptions in type conversions:

import arcpy
import re

# find largest number in a string

arcpy.env.workspace = r"D:\APRX_MXDS\USA_App_Project\usa_parcels_with_FARField.gdb"
arcpy.env.overwriteOutput = True

fc = "temp"

with arcpy.da.UpdateCursor(fc, ["FAR_INTEGER", "MAX_VALUE"]) as cursor: # Loop through each feature
    for row in cursor:
        if row[0] is None:  continue
        nbrs = []
        for i in re.findall('(\d+\.?\d*|\d*\.\d+)', row[0]):
            try:
                nbrs.append(float(i))
            except:
                pass
        if nbrs:
            row[1] = max(nbrs)
            cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ # if FAR_INTEGER contained no numbers, no need to update

That's a slightly improved regex, but I still wouldn't trust it to pass up every erroneous input. You could make an argument to wrap the list assignment in the try statement. But if the regex matches one erroneous input mixed in with some valid inputs, you would get no max value.

WillHouston
Regular Contributor

First, there are a number of problems your code:

  • I usually pass the full path to the feature class of a file gdb to UpdateCursor, instead of setting the workspace and passing only the table name. Your method might work, though.
  • I believe the second parameter of UpdateCursor requires an array (["FAR_INTEGER"] instead of just "FAR_INTEGER").
  • split() only splits strings on whitespace (spaces, tabs, newlines), so you should remove commas row[0].replace(',', '').split().
  • integer conversion will probably panic if it sees a number with a decimal (e.g. 3.9), so use float conversion instead (float(w)
  • return statements end execution and return that value to the caller. If you want to store the max value, you could put it in a field on that feature class. Then your UpdateCursor call could be UpdateCursor(fc, ["FAR_INTEGER", "max_value"]) and you could use updateRow(row) to store that value.
  • Python indentation separates blocks of code, so both those try statements would be called for every word in every string in every row of the table (if that second try statement didn't guarantee a return which ends execution).

Resulting in something like:

import arcpy

# find largest number in a string

arcpy.env.workspace = r"D:\APRX_MXDS\USA_App_Project\usa_parcels_with_FARField.gdb"
arcpy.env.overwriteOutput = True

fc = "temp"

with arcpy.da.UpdateCursor(fc, ["FAR_INTEGER", "max_far"]) as cursor:
    for row in cursor:
        ls = []
        for w in row[0].replace(',', '').split():
            try:
                ls.append(float(w))
            except:
                pass
        try:
            row[1] = max(ls)
            cursor.updateRow(row)
        except:
            pass

I didn't test that, so I might have introduced more bugs.

Now, executing that script is a different topic. What program are you using? I'm most familiar with ArcGIS Pro, in which you should be able to create a new Jupyter notebook (Insert Tab -> New Notebook) and copy-paste it in.

0 Kudos
NataliaGutierrez1
Regular Contributor

thank you so much for your replies. I will test all these and will let you know how it went

0 Kudos