Select to view content in your preferred language

Calculate field/ python code question

1880
18
Jump to solution
11-30-2023 03:44 PM
BrodieChisholm
New Contributor III

Good day all,

I have a layer with the following fields in the attribute table:

BrodieChisholm_0-1701387279253.png

I am trying to populate the species fields (PW, PR, PJ, SB, etc.) with the information found in the "SPCOMP" field.

BrodieChisholm_2-1701387479307.png

All of the values that are in the “SPCOMP” field should be put into related individual species fields, and leave the rest of species fields to 0. For example, if the “SPCOMP” = “Pj 60Pt 30Bw 10”, that means that Pj=60, Pt=30, Bw=10. Then 60 should go under “PJ”, 30 goes under “PT”, and 10 goes under “BW”.

I've tried various different things through the "Calculate field" tool but have been unsuccessful.

I am thinking something like:
For field PW: if SPCOMP includes the text "Pw" return following number - omit the rest of the string, else return 0.

Obviously this is not in proper Python code, and would have to be repeated for every field.

 

If anyone has a solution, or Python code example to accomplish this, I would be so grateful.

Thank you

 

0 Kudos
18 Replies
AlfredBaldenweck
MVP Regular Contributor

I'm sorry, there was a copy-paste error on my part.

Add "import re" as the very first line of the code.

(I've updated my snippet to reflect this)

0 Kudos
BrodieChisholm
New Contributor III

Hello Alfred,

I amended your code and have been able to run through most of it - currently getting an error right at the end (line 41, 42):

BrodieChisholm_0-1701694414079.png

I tried replacing "str" with the field name, 'Pw' for example or 'SPCOMP' but yielded no better results.

I apologize as at this point my lack of knowledge involving python really is hindering me, I don't want to be a nuissance.

Though, I really do appreciate the work you have put into this and help you have offered.

Thank you. 

0 Kudos
AlfredBaldenweck
MVP Regular Contributor

So, a key error is when you go to look something up in a dictionary and it doesn't exist. So it's looking for the value of 52 and not finding it and freaking out.

A way to get around this normally is to use dictionary.get(), in which you tell it what key to look for and what to return if you don't find it. (I don't use this in general because I find it confusing lol).

In this case, however, I'm not really sure what happened here. The str() is important because bad things happen if you use a true number (e.g. an integer) as a dictionary key, so I convert to string to get around that. But your value should be in there so ???

 

I'd give @JoshuaBixby's solution a shot. It's a lot more straightforward than mine. 

0 Kudos
DanPatterson
MVP Esteemed Contributor

For future reference if you need to fix similar data structures, and for learning

 

def separate_string_number(string, as_list=False):
    """Return a string split into strings and numbers, as a list.

    z = 'Pj 60Pt 30Bw 10'
    z0 = 'PJ60PT30BW10'
    separate_string_number(z)
    separate_string_number(z0)
    returned value
    ['Pj', '60', 'Pt', '30', 'Bw', '10']

    separate_string_number("A .1 in the 1.1 is not 1")
    ['A', '.1', 'in', 'the', '1.1', 'is', 'not', '1']

    Modified from https://stackoverflow.com/a/57359921/6828711
    """
    groups = []
    prev = string[0]
    newword = string[0]
    if len(string) <= 1:
        return [string]
    for x, i in enumerate(string[1:]):
        if i.isalpha() and prev.isalpha():
            newword += i
        elif (i.isnumeric() or i == '.') and (prev.isnumeric() or prev == '.'):
            newword += i
        else:
            groups.append(newword.strip())
            newword = i
        prev = i
        if x == len(string) - 2:
            groups.append(newword.strip())  # strip any spaces
            newword = ''
    # remove extraneous space values in groups
    groups = [i for i in groups if i != '']
    if as_list:
        return groups
    # -- pair values, special case
    s = " ".join(["".join(groups[pos:pos + 2])
                  for pos in range(0, len(groups), 2)]
                 )
    return s

example

z = 'Pj 60Pt 30Bw 10'

separate_string_number(z, as_list=False)
'Pj60 Pt30 Bw10'

separate_string_number(z, as_list=True)
['Pj', '60', 'Pt', '30', 'Bw', '10']

... sort of retired...
JoshuaBixby
MVP Esteemed Contributor

Although a regular expression answer has been put forward, I believe there is a more straightforward regular expression than what has been already proposed.  Also, all that is needed in terms of cursors is a single pass through the data set with an update cursor.

First, the regular expression.  You are trying to match pairs of species-values that have been concatenated into a single string, so there are 3 things to differentiate:  1) a species-value pair, 2) the species within that pair, and 3) the numeric value for the species in that pair.

The following regular expression captures that logic:

 

>>> import re
>>>
>>> reg_exp = "(?:([A-Za-z]+)\s*([0-9]+))+?"
>>>
>>> SPCOMP_samples = [
...     "Pj 60Sb 40",
...     "Pj 60Sb 40",
...     "Sb 80Pj 10La 10",
...     "Pj 60Sb 40",
...     "Pj 80Sb 10Pt 10",
...     "Pj 80Sb 10Pt 10"
... ]
>>>
>>> for sample in SPCOMP_samples:
...     re.findall(reg_exp, sample)
...
[('Pj', '60'), ('Sb', '40')]
[('Pj', '60'), ('Sb', '40')]
[('Sb', '80'), ('Pj', '10'), ('La', '10')]
[('Pj', '60'), ('Sb', '40')]
[('Pj', '80'), ('Sb', '10'), ('Pt', '10')]
[('Pj', '80'), ('Sb', '10'), ('Pt', '10')]
>>> 

 

The regular expression uses a non-capturing outer group to find the species-value pairs, and then uses two internal capturing groups to differentiate the species from the numeric value.  Using the expression with re.findall returns a list of tuples containing a species and a numeric value.

Since each tuple contains the species and the value, you can use the species to look up the index of the species field in a cursor and update that field with the value.

 # Note: Code below hasn't been tested
 
 reg_exp = "(?:([A-Za-z]+)\s*([0-9]+))+?"
 
 fc = # path to feature class or shape file 
 with arcpy.da.UpdateCursor(fc, "*") as cur:
    spcomp_idx = cur.fields.index("SPCOMP")
    for row in cur:
        for species, value in re.findall(reg_exp, row[spcomp_idx]):
            species_idx = cur.fields.index(species.upper())
            row[species_idx] = value
        cur.updateRow(row)

 

AlfredBaldenweck
MVP Regular Contributor

Called it. This is a lot more straightforward than what I did.

0 Kudos
BrodieChisholm
New Contributor III

Hello Joshua,

This is great, thank you for doing the work and getting this written up for me.

I have been able to run the first part independently - however I am running into an issue with the second half. 

Here is a screen shot - along with the error message I am receiving:

BrodieChisholm_0-1701698971428.png

Note that I modified the beginning of the code to add: import re #Edited to add this line in since I forgot it whoops.
fc = r"D:\Brodie\Projects\2023\FRMG Exercises\Coding Exercise\FRMG_Coding_Exercise\ftg_true_all_change.shp"
reg_exp = "(?:([A-Za-z]+)\s*([0-9]+))+?"

Thanks again, I really appreciate your help with this (I am a floundering newbie with python).

0 Kudos
AlfredBaldenweck
MVP Regular Contributor

Take that offending line out entirely.

In addition to the indent error, that line is redefining the fc variable to be ???? because you haven't put a value there, just a comment telling you to put a value there. (Edit: this will throw an end of line error, since it's expecting a value for fc. It just hit the indent error first.) You already defined the fc variable at the top, so you don't need it here. (Probably the same for the second instance of the regular expression)

Python relies on indentation to figure out how each line relates to each other, lines belonging to a loop are indented further than the loop open itself.

e.g.

 

for i in [1,2,3]:
    #Do something
    # Do something else

 

In this case, the offending line has one space at the beginning, throwing it out of line with everything else.

Also, so does the following line (with arcpy.da.UpdateCursor...)

Generally, 4 spaces (Or a tab, pick one or the other but not both) is recommended per indent level, but I think technically as long as they're consistent within whatever loop they belong to, you're fine. Four spaces is better for readability, though.

BrodieChisholm
New Contributor III

HALLELUJAH!

It's working, you are all the greatest, I am so appreciative of all of your help.

It takes a community as they say.

 

Thank you all.

0 Kudos