I have a field with with descriptions and i need to extract the tax numbers from this field and put them into another field in the same layer.
Some examples of what i need .
1.
12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A
- I need to extract "TX 2" from this field
2.
23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1
- I need to extract "TX 11271" from this row
3.
04-5N-5W SE TAX 3 & TAX 4 IN NWSE
- I need to extract "TAX 3, TAX 4" from this row
4.
21-4N-3W SW TX 86, 89, 90 & 93 IN S 1/2 OF SW
- I need to exact "TX 86, 89,90 & 93" from this row
I currently can extract numbers with the following and i need help to include what i mentioned above. I guess i need to extract everything beginning at TX and before the space before the next alpha characters.
fc = r'C:\Temp\Parcels.shp'
with arcpy.da.UpdateCursor(fc,['Legal','Legal2']) as cursor:
for row in cursor:
row[1] = ''.join([str(i) for i in row[0] if i.isdigit()])
cursor.updateRow(row)
Solved! Go to Solution.
You have some edge cases that will be challenging to address, but Python regular expressions should be able to get a vast majority of your cases:
>>> import re
>>>
>>> recs = [
... "12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A",
... "23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1",
... "04-5N-5W SE TAX 3 & TAX 4 IN NWSE",
... "21-4N-3W SW TX 86, 89, 90 & 93 IN S 1/2 OF SW"
... ]
>>>
>>> pattern = re.compile(r"((?:TX|TAX) [,& 0-9]+)")
>>>
>>> for rec in recs:
... print(pattern.findall(rec))
...
...
['TX 2 ', 'TX 2']
['TX 11271 ']
['TAX 3 & ', 'TAX 4 ']
['TX 86, 89, 90 & 93 ']
>>>
One edge case my re expression doesn't currently address is the "TX 2-A" at the end of example 1. You don't mention wanting to extract it, why not? Also, have a tax number with an alphabetic character in it does complicate the regular expression.
UPDATE: I came up with a slightly different, I think better, approach that will return just the tax ids as a list from the description:
>>> import re
>>>
>>> recs = [
... "12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A",
... "23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1",
... "04-5N-5W SE TAX 3 & TAX 4 IN NWSE",
... "21-4N-3W SW TX 86, 89, 90 & 93 IN S 1/2 OF SW"
... ]
>>>
>>> pattern = re.compile(r"[ ,&]+")
>>>
>>> def extract_tax_ids(split_desc):
... l = []
... b = False
... for i in split_desc:
... if i in ("TX", "TAX"):
... b = True
... continue
... if b and not i.isalpha():
... l.append(i)
... else:
... b = False
... return l
...
>>>
>>> for rec in recs:
... print(extract_tax_ids(pattern.split(rec)))
...
['2', '2-A']
['11271']
['3', '4']
['86', '89', '90', '93']
>>>
strings are iterables and you can use an 'in' check. since you have so few cases, you could make a list of those and compare the cases to each record in a searchcursor. the logic is as follows
cases = ['TX 2', 'TX 11271']
records = ['12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A',
'23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1',
'04-5N-5W SE TAX 3 & TAX 4 IN NWSE'
]
for rec in records:
for case in cases:
if case in rec:
print("{} found in {}".format(case, rec))
else:
print("{} not found".format(case))
TX 2 found in 12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A
TX 11271 not found
TX 2 not found
TX 11271 found in 23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1
TX 2 not found
TX 11271 not found
Of course, if you are now going to say you have a few hundred cases... never mind
You have some edge cases that will be challenging to address, but Python regular expressions should be able to get a vast majority of your cases:
>>> import re
>>>
>>> recs = [
... "12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A",
... "23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1",
... "04-5N-5W SE TAX 3 & TAX 4 IN NWSE",
... "21-4N-3W SW TX 86, 89, 90 & 93 IN S 1/2 OF SW"
... ]
>>>
>>> pattern = re.compile(r"((?:TX|TAX) [,& 0-9]+)")
>>>
>>> for rec in recs:
... print(pattern.findall(rec))
...
...
['TX 2 ', 'TX 2']
['TX 11271 ']
['TAX 3 & ', 'TAX 4 ']
['TX 86, 89, 90 & 93 ']
>>>
One edge case my re expression doesn't currently address is the "TX 2-A" at the end of example 1. You don't mention wanting to extract it, why not? Also, have a tax number with an alphabetic character in it does complicate the regular expression.
UPDATE: I came up with a slightly different, I think better, approach that will return just the tax ids as a list from the description:
>>> import re
>>>
>>> recs = [
... "12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A",
... "23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1",
... "04-5N-5W SE TAX 3 & TAX 4 IN NWSE",
... "21-4N-3W SW TX 86, 89, 90 & 93 IN S 1/2 OF SW"
... ]
>>>
>>> pattern = re.compile(r"[ ,&]+")
>>>
>>> def extract_tax_ids(split_desc):
... l = []
... b = False
... for i in split_desc:
... if i in ("TX", "TAX"):
... b = True
... continue
... if b and not i.isalpha():
... l.append(i)
... else:
... b = False
... return l
...
>>>
>>> for rec in recs:
... print(extract_tax_ids(pattern.split(rec)))
...
['2', '2-A']
['11271']
['3', '4']
['86', '89', '90', '93']
>>>
I was able to get the print outs. I still need to get the Legal2 field populated with the results some how.
When i add an update cursor i get the following error
line 27, in <module>
with arcpy.da.UpdateCursor(recs,["Legal2"]) as cursor:
RuntimeError: 'in_table' is not a table or a featureclass
I am a little lost on how to populate the Legal2 field? - with arcpy.da.UpdateCursor(recs,["Legal2"]) as cursor:
with arcpy.da.UpdateCursor(recs,["Legal2"]) as cursor:
for row in cursor:
print results, but they looked like this.
['02362']
['03315', '05857', '05858']
['15022']
['03398']
['5858']
['05857']
['1', '2', 'T72007']
['1']
['2']
[]
[]
[]
[]
[]
['04089']
['04089']
currently have
import arcpy, re
recs = r'C:\Temp\Descriptions.shp'
field = 'Legal'
recs = (row[0] for row in arcpy.da.SearchCursor(recs, field))
pattern = re.compile(r"[ ,&]+")
def extract_tax_ids(split_desc):
l = []
b = False
for i in split_desc:
if i in ("TX", "TAX"):
b = True
continue
if b and not i.isalpha():
l.append(i)
else:
b = False
return l
#for rec in recs:
#where = extract_tax_ids(pattern.split(rec))
with arcpy.da.UpdateCursor(recs,["Legal2"]) as cursor:
for row in cursor:
What do you want the Legal2 field to be? A comma separated string?
Looking at your code, you have a couple issues going on. First, you set recs to be a path to a shape file, but then it gets set again as a generator expression. That is why your cursor is giving the error it is. Second, you won't be able to pass a list of values directly into a single field with the update cursor.
To clarify, you want to start with either "TX" or "TAX" and stop just before "IN"?
Some times it' not before the IN, some times it's before other alpha characters. It does appear to before alpha characters though.
22-3N-2W SE NAMPA ORIGINAL TX 98728 OF LT 32 BLK 34
02-3N-2W NW TX 03475 LS RD IN S 1/2 NW
Here's my latest test. It looks for a pattern of TX or TAX and selects everything until a letter is found that is neither TX nor TAX. The test code:
import re
recs = [
"12-5N-5W SW NW SW S & W OF CANAL,, TX 2 IN SWNW LS TX 2-A",
"23-3N-2W SW 4TH ST TOWNHOMES TX 11271 IN LT 4C BLK 1",
"07-4N-3W NE NENE S&W OF HWY",
"04-5N-5W SE TAX 3 & TAX 4 IN NWSE",
"22-3N-2W SE NAMPA ORIGINAL TX 98728 OF LT 32 BLK 34",
"02-3N-2W NW TX 03475 LS RD IN S 1/2 NW12",
"21-4N-3W SW TX 86, 89, 90 & 93 IN S 1/2 OF SW"
]
pattern = re.compile(r"((?:TX|TAX) [^A-Z]+ )")
for rec in recs:
print ''.join((pattern.findall(rec))).strip()
# outputs:
TX 2
TX 11271
(blank line)
TAX 3 & TAX 4
TX 98728
TX 03475
TX 86, 89, 90 & 93
So I would suggest trying something like:
import arcpy
import re
pattern = re.compile(r"((?:TX|TAX) [^A-Z]+ )")
fc = r'C:\Temp\Descriptions.shp'
with arcpy.da.UpdateCursor(fc,['Legal','Legal2']) as cursor:
for row in cursor:
row[1] = ''.join(pattern.findall(row[0].upper())).strip()
cursor.updateRow(row)
Hope this helps.
The code worked but i think the data in the legal field has lots of variations and i thought that the tax number ended before an alpha number but it doesn't, some will be TAX 30-A-1-E or TX 30-A-1, etc.
27-4N-3W SW BEALS ACREAGE E 278' OF TX 30-A OF BLKS 3,4,5,6 LESS TX 30-A-1 LS ST
22-4N-3W NE DEMENT TAX 7-B BLK 52 IN DEMENT & BOONE CALD
27-4N-3W SW BEALS ACREAGE TAX 14-B IN BLK 4 LESS TAX 14-B-1
27-4N-3W SW BEALS ACREAGE TAX 20, TAX 30-A-1-E BLK 3
27-4N-3W SW BEALS ACREAGE TX 30-A-1-D-A BLK 6
24-4N-5W NE TAX 2&3 IN NENE
31-4N-2W SW LONGVIEW PLACE TX 1 IN BLKS 1,2,3,LS HWY,, TX 4,5 IN BLK 3 & TX 9,10 IN BLK 4 TX 8 IN BLK 6
24-3N-2W NW NESW LESS TAX 34 & 46, S 1/2 NW LS TX 67 IN SWNW,,TX 46-A IN NESW
21-4N-3W NE CALLOWAY ADD TX 4 & 5 & W 7' OF S 114' OF TX 3 IN LT 3 LS HWY BLK 2
1. should be TX 30-A, TX 30-A-1
2. should be TAX 7-B
3. Should be TAX14-B, TAX 14-B-1
4. Should be TAX 20, TAX 30-A-1-E
5. Should be TX 30-A-1-D-A
6. Should be TAX 2, TAX3 or TAX 2&3
7. TX1, TX 4&5, TX9&10
8. TAX 34 &46. TX 46-A or TAX34, TAX46, TX 46-A
9. Should be TX 4&5, TX 3 or TX 4, TX 5, TX 3
Running a print i get the following.
20', '30-A-1-E']
['43']
['33', '4', '17', '30-A-1-B']
['4', '33', '30-A-1-A']
['18', '30A1CC']
['30-A-1-F']
['30-A-1-D', '30-A-1-D-A', '30-A-1-F']
['30-A-1-C-A']
['30-A-1-C-B']
['7', '7A', '7B', '7C']
i think the data in the legal field has lots of variations
I agree, and I don't think you will be able to come up with a rule set that will address all cases; well, come up with one in a relatively reasonable amount of time. A couple of different approaches have been suggested and demonstrated, and I think now you just need to dig in. I suggest find a rule set that address most cases and then validating the results and addressing the edge cases manually.