Hi all.
I am struggling with extracting a number in combination with a percentage sign (e.g 40 %, 60 %; 30 %, 30 % and 40 % and so on) from a field. The issue is that there is no regular position of said number - see example below - and that there are numbers which I dont want to extract.
Any ideas on how to extract the number-character combination? Thanks for any suggestions
Example fields:
V 40 %, B112 60 %
V 30 %, B11 30 %, K 40 %
S 132 90 %, R113-GB00BK 10 %
Solved! Go to Solution.
If the data are consistently formatted, you might consider looking at regular expressions to accomplish this.
Here's my code:
import re
values = [
'V 40 %, B112 60 %',
'V 30 %, B11 30 %, K 40 %',
'S 132 90 %, R113-GB00BK 10 %'
]
# Regex patterns
other_patt = re.compile('\S+?(?=\s[0-9]+\s%)')
percent_patt = re.compile('[0-9]+(?=\s%)')
for value in values:
print([re.findall(other_patt, value), re.findall(percent_patt, value)])
And here's what it returns:
[['V', 'B112'], ['40', '60']]
[['V', 'B11', 'K'], ['30', '30', '40']]
[['132', 'R113-GB00BK'], ['90', '10']]
Breaking down the regex patterns:
Regex is quite useful for extracting text, and is a module well worth digging into.
Which number do you want to extract in those examples? Can you give examples along with the intended result? There may be no regular position, but is there a regular format - do you want the numbers preceding the % sign?
Hi David - thanks for your response. I specified my question to which numbers I want to extract!
The codes are always separated by a comma - the codes themselfes vary though
Intended result would be a column with
V, B112
V, B11, K
S132, R113-GB00BK
and one with
40, 60
30, 30, 40
90, 10
", ".join([i.strip().split(" ")[-1] for i in s0.split("%") if i])
'30, 30, 40'
like so? See my post
This will get you thinkin about what you want.
Currently it returns a list so you have to decide whether you want a particular value from the list (eg first, last etc) or whether you want to concatenate the values together (eg ", ".join([i for i in ... )
s = "S 132 90 %, R113-GB00BK 10 %"
[i.strip().split(" ")[-1] for i in s.split("%") if i]
['90', '10']
s0 = "V 30 %, B11 30 %, K 40 %"
[i.strip().split(" ")[-1] for i in s0.split("%") if i]
['30', '30', '40']
s1 = "V 40 %, B112 60 %"
[i.strip().split(" ")[-1] for i in s1.split("%") if i]
['40', '60']
If the data are consistently formatted, you might consider looking at regular expressions to accomplish this.
Here's my code:
import re
values = [
'V 40 %, B112 60 %',
'V 30 %, B11 30 %, K 40 %',
'S 132 90 %, R113-GB00BK 10 %'
]
# Regex patterns
other_patt = re.compile('\S+?(?=\s[0-9]+\s%)')
percent_patt = re.compile('[0-9]+(?=\s%)')
for value in values:
print([re.findall(other_patt, value), re.findall(percent_patt, value)])
And here's what it returns:
[['V', 'B112'], ['40', '60']]
[['V', 'B11', 'K'], ['30', '30', '40']]
[['132', 'R113-GB00BK'], ['90', '10']]
Breaking down the regex patterns:
Regex is quite useful for extracting text, and is a module well worth digging into.
Thanks very much - that helped a lot!
The pattern does seem structured and simple enough that you might not need regular expressions, but I like regular expressions like @jcarlson.
>>> import re
>>>
>>> rows = [
... r"V 40 %, B112 60 %",
... r"V 30 %, B11 30 %, K 40 %",
... r"S 132 90 %, R113-GB00BK 10 %"
... ]
>>>
>>> pct_pttn = re.compile("(\d*) %")
>>>
>>> # populuate column 1
>>> for s in rows:
... pct_pttn.sub("", s).strip()
...
'V , B112'
'V , B11 , K'
'S 132 , R113-GB00BK'
>>>
>>> # populate column 2
>>> for s in rows:
... ", ".join(pct_pttn.findall(s))
...
'40, 60'
'30, 30, 40'
'90, 10'
>>>