If the data are consistently formatted, you might consider looking at regular expressions to accomplish this.
Here's my code:
import re
values = [
'V 40 %, B112 60 %',
'V 30 %, B11 30 %, K 40 %',
'S 132 90 %, R113-GB00BK 10 %'
]
# Regex patterns
other_patt = re.compile('\S+?(?=\s[0-9]+\s%)')
percent_patt = re.compile('[0-9]+(?=\s%)')
for value in values:
print([re.findall(other_patt, value), re.findall(percent_patt, value)])
And here's what it returns:
[['V', 'B112'], ['40', '60']]
[['V', 'B11', 'K'], ['30', '30', '40']]
[['132', 'R113-GB00BK'], ['90', '10']]
Breaking down the regex patterns:
- \S+?(?=\s[0-9]+\s%)
- \S+?
- \S+ matches one or more non-whitespace characters.
- The '?' makes it non-greedy, meaning it will match as little as possible, so that we don't inadvertently grab more than one value
- (?=...) Indicates a lookahead expression, meaning it specifically looks for strings which are followed by the value in the '...', but does not include that value in the returned match
- \s looks for a whitespace character
- [0-9]+ Looks for one or more consecutive numeric characters.
- % Just a literal '%' character!
- [0-9]+(?=\s%)
- Again, [0-9]+ is looking for one or more consecutive numeric characters.
- (?=\s$) A simpler lookahead expression, this time looking only for those numeric characters followed by a single whitespace character and a percent sign
Regex is quite useful for extracting text, and is a module well worth digging into.
- Josh Carlson
Kendall County GIS