Using Regular Expression in Calculate Field

581
3
04-22-2020 03:56 PM
MattSmith6
New Contributor III

Asking for some help on a task I'm dealing with.

I have a field (PLAN) with a string that I'd like to split into two separate text fields (PLAN_PFX & PLAN_NUM). Each entry is predictable in the sense that it will be 1-5 alphabetic characters followed by 1-6 numeric characters. Alphabetic characters are always upper-case.

So what I need to do is something like...

PLANPLAN_PFXPLAN_NUM
A305A305
BTR86542BTR86542
RP808112RP808112
WR8WR8

I know I'm going to need to use Regular Expression for it. I've tried a few methods just focusing on calculating for PLAN_NUM initially but I'm just fumbling around in the dark really.

PLAN_NUM = regex(!PLAN!)

import re

def regex(text):

    split_list = re.split(r'[A-Z]', text)

    return split_list[-1]

But this just returns a blank entry.

Any help appreciated. Thanks in advance.

0 Kudos
3 Replies
MattSmith6
New Contributor III

OK, never mind, the above code does work. Just have to change it to get the other part now!

Alternatively, if anyone has a more elegant way to do this, I'm all ears.

0 Kudos
BenTurrell
Occasional Contributor III

Matt Smith‌,

You could also use this regex expression which might save you some trouble if you end up with some dodgy data thrown in. It should also simplify your output as well as [0] will be text and [1] will be numbers:

import re

 

def regex(text):

    split_list = re.split(r'^[A-Z]+', text)

    return split_list[-1]

JoshuaBixby
MVP Esteemed Contributor

re.split won't give the results you expect:

>>> l = ["A305", "BTR86542", "RP808112", "WR8"]
>>> for s in l:
...     re.split(r'^[A-Z]+', s)
...
['', '305']
['', '86542']
['', '808112']
['', '8']
>>> 

Although the numbers are present in the output, the text portion will not be present in [0].

I don't think re.split is the right method for this job, re.match will work better:

>>> l = ["A305", "BTR86542", "RP808112", "WR8"]
>>> for s in l:
...     re.match(r'([A-Z]+)([^A-Z]+)', s).groups()
...
('A', '305')
('BTR', '86542')
('RP', '808112')
('WR', '8')
>>> 

Even though the above code snippet will work, it still leaves the door open to issues with case sensitivity and fields that only have a number.

I prefer to use character classes:

>>> l = ["A305", "BTR86542", "RP808112", "WR8", "Wr8", "305"]
>>> for s in l:
...     re.match(r'(\D*)(\d*)', s).groups()
...
('A', '305')
('BTR', '86542')
('RP', '808112')
('WR', '8')
('Wr', '8')
('', '305')
>>>