Hello All,
@ShitijMehta @DanPatterson, @XanderBakker , @Robert_LeClair @Anonymous User
I have a polygon feature class of buildings in the city, one of the fields in it is ARCHTYPE and it’s a multivalue concatenated field as shown in the image. (NOTE: There are total of 25 Archetype in my data)
So, I want to count the occurrence of each archetype (values are separated by semi colon) in a row and create 25 new fields with the archetype names and update those new fields with the count number.
So, for example Structure_ID 1400232 (2ND HIGHLIGHTED ROW) I would have 2 new fields called Small Multiplex<2007 and Large Multiplex built < 2007 It should give me value of 4 for and 1 respectively.
The table below is something that i am trying to do counting the each value in Archetype Field and updating the preexisting fields with there respective numbers
ARCHETYPE | Small Multiplex built < 2007 | Large Multiplex built < 2007 | Archtyp3 | Archtyp4 | Archtyp5 | Archtyp6 | Archtyp7 | Archtyp8 | Archtyp9 | Archtyp10 | |
Record 1 | Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Large Multiplex built < 2007;Large Multiplex built < 2007;Large Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Large Multiplex built < 2007 | 10 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
It would be a great help if someone help me find a way to do this within modelbuilder since this is one of the intermediately step in my main model so either a geoprocessing tool or a script would go a long way.
Thanks in advance and looking forward for a response.
Regards
Hussain
Solved! Go to Solution.
Hi @HussainMalik1 ,
See below the script that allows you to get the statistics:
import arcpy
def main():
# read the input featureclass
# fc = arcpy.GetParameterAsText(0)
fc = r'D:\GeoNet\ExtractText\XanderBakker.gdb\Hussain_Data_ConcatenatedFieldValueCount'
# input data field
fld_name = 'ARCHETYPE'
# define how the items relate to the fields
dct_flds = {'Large Multi-Storey Home built < 2007': 'LMSHBLT2007',
'Large Multi-Storey Home built >= 2007': 'LMSHBGE2007',
'Large Multiplex built < 2007': 'LMBLT2007',
'Large Multiplex built >= 2007': 'LMBGE2007',
'Large Retail': 'LargeRetail',
'Large Schools': 'LargeSchools',
'Large Single Storey Home built < 2007': 'LSSHBLT2007',
'Large Single Storey Home built >= 2007': 'LSSHBGE2007',
'Medium Apartments Mixed': 'MediumAptMixed',
'Medium Offices': 'MediumOffices',
'Small Apartments Mixed': 'SmallAptMixed',
'Small Multi-Storey Home built < 2007': 'SMSHMLT2007',
'Small Multi-Storey Home built >= 2007': 'SMSHBGE2007',
'Small Multiplex built < 2007': 'SMBLT2007',
'Small Multiplex built >= 2007': 'SMBGE2007',
'Small Offices': 'SmallOffices',
'Small Retail': 'SmallRetail',
'Small Schools': 'SmallSchools',
'Small Single Storey Home built < 2007': 'SSSHBLT2007',
'Small Single Storey Home built >= 2007': 'SSSHBGE2007',
'UNDEFINED': 'UNDEFINED',
'Very Small Apartments Mixed': 'VerySmallAptMixed'}
# add fields if necessary
addfields = True
if addfields:
AddFields(fc, dct_flds)
# create list of fields
flds = [fld_name]
for fld in dct_flds.values():
flds.append(fld)
# loop through data
with arcpy.da.UpdateCursor(fc, flds) as curs:
for row in curs:
archetype = row[0]
# calculate statistics
dct_cnts = GetStats(archetype, dct_flds)
# set row stats values
for fld_name, cnt in dct_cnts.items():
i = flds.index(fld_name)
row[i] = cnt
# write updated row
curs.updateRow(row)
def AddFields(fc, dct_flds):
# Add the fields mentioned in the dictionary
for description, fld_name in dct_flds.items():
if not FieldExists(fc, fld_name):
arcpy.AddField_management(fc, fld_name, "LONG")
def FieldExists(fc, fld_name):
# Check if field exists
if len(arcpy.ListFields(fc, fld_name)) == 0:
return False
else:
return True
def GetStats(archetype, dct_flds):
# get the stats from the archetype
dct_cnts = {}
archetypes = archetype.split(";")
for description in archetypes:
if description in dct_flds:
fld_name = dct_flds[description]
if fld_name in dct_cnts:
dct_cnts[fld_name] += 1
else:
dct_cnts[fld_name] = 1
else:
# oops, value not defined in dct_flds, report this
print('{} is not included in dct_flds'.format(description))
pass
return dct_cnts
if __name__ == '__main__':
main()
This script points on line 6 to the data you shared. I noticed that the featureclass did not have any output fields defined, so I added code to add the new fields if they don't exist. Please note that there are a large number of UNDEFINED values in the fields.
Below some statistics of what was found in the data:
Description | Count | Output Field name |
Large Multi-Storey Home built < 2007 | 272 | LMSHBLT2007 |
Large Multi-Storey Home built >= 2007 | 160 | LMSHBGE2007 |
Large Multiplex built < 2007 | 1708 | LMBLT2007 |
Large Multiplex built >= 2007 | 1548 | LMBGE2007 |
Large Retail | 60 | LargeRetail |
Large Schools | 12 | LargeSchools |
Large Single Storey Home built < 2007 | 2852 | LSSHBLT2007 |
Large Single Storey Home built >= 2007 | 44 | LSSHBGE2007 |
Medium Apartments Mixed | 40 | MediumAptMixed |
Medium Offices | 8 | MediumOffices |
Small Apartments Mixed | 72 | SmallAptMixed |
Small Multi-Storey Home built < 2007 | 1980 | SMSHMLT2007 |
Small Multi-Storey Home built >= 2007 | 424 | SMSHBGE2007 |
Small Multiplex built < 2007 | 2484 | SMBLT2007 |
Small Multiplex built >= 2007 | 392 | SMBGE2007 |
Small Offices | 20 | SmallOffices |
Small Retail | 132 | SmallRetail |
Small Schools | 28 | SmallSchools |
Small Single Storey Home built < 2007 | 6668 | SSSHBLT2007 |
Small Single Storey Home built >= 2007 | 4 | SSSHBGE2007 |
UNDEFINED | 552 | UNDEFINED |
Very Small Apartments Mixed | 224 | VerySmallAptMixed |
Let me know if you have any questions. I will try to send the FGDB by private message (if it fits)...
Hi @SMehta ,
This for sharing the idea of using Collections::Counter, which is indeed a great way of splitting a text and get a dictionary with the counts. Kudos for that. However, in your code, I don't see you are handling the creation of the same field twice which I assume should fail and using the description for the field name would also fail since the descriptions are no valid field names.
Find below the part of the GetStats function which will now set values to 0 and avoids getting the Null values in the result:
def GetStats(archetype, dct_flds):
# get the stats from the archetype
dct_cnts = {}
for description, fld_name in dct_flds.items():
dct_cnts[fld_name] = 0
archetypes = archetype.split(";")
for description in archetypes:
if description in dct_flds:
fld_name = dct_flds[description]
if fld_name in dct_cnts:
dct_cnts[fld_name] += 1
else:
# this part is no longer required
dct_cnts[fld_name] = 1
else:
# oops, value not defined in dct_flds, report this
print('{} is not included in dct_flds'.format(description))
pass
return dct_cnts
Hi @HussainMalik1 ,
Would it be possible to share a sample of your data? This would help to write a snippet of code to show how you can do this.
@XanderBakker Thanks for getting back to me, really appreciate it, and yes i can share a sample dataset that you can download from my one drive, if you could share your email i could email you the link for my one drive.
Thanks
Hussain
Hi @HussainMalik1 ,
You can use a direct message here inside GeoNet or mail me using "xbakker [at] esri.co" (and yes it is "co" for Colombia).
Hi @HussainMalik1 ,
See below the script that allows you to get the statistics:
import arcpy
def main():
# read the input featureclass
# fc = arcpy.GetParameterAsText(0)
fc = r'D:\GeoNet\ExtractText\XanderBakker.gdb\Hussain_Data_ConcatenatedFieldValueCount'
# input data field
fld_name = 'ARCHETYPE'
# define how the items relate to the fields
dct_flds = {'Large Multi-Storey Home built < 2007': 'LMSHBLT2007',
'Large Multi-Storey Home built >= 2007': 'LMSHBGE2007',
'Large Multiplex built < 2007': 'LMBLT2007',
'Large Multiplex built >= 2007': 'LMBGE2007',
'Large Retail': 'LargeRetail',
'Large Schools': 'LargeSchools',
'Large Single Storey Home built < 2007': 'LSSHBLT2007',
'Large Single Storey Home built >= 2007': 'LSSHBGE2007',
'Medium Apartments Mixed': 'MediumAptMixed',
'Medium Offices': 'MediumOffices',
'Small Apartments Mixed': 'SmallAptMixed',
'Small Multi-Storey Home built < 2007': 'SMSHMLT2007',
'Small Multi-Storey Home built >= 2007': 'SMSHBGE2007',
'Small Multiplex built < 2007': 'SMBLT2007',
'Small Multiplex built >= 2007': 'SMBGE2007',
'Small Offices': 'SmallOffices',
'Small Retail': 'SmallRetail',
'Small Schools': 'SmallSchools',
'Small Single Storey Home built < 2007': 'SSSHBLT2007',
'Small Single Storey Home built >= 2007': 'SSSHBGE2007',
'UNDEFINED': 'UNDEFINED',
'Very Small Apartments Mixed': 'VerySmallAptMixed'}
# add fields if necessary
addfields = True
if addfields:
AddFields(fc, dct_flds)
# create list of fields
flds = [fld_name]
for fld in dct_flds.values():
flds.append(fld)
# loop through data
with arcpy.da.UpdateCursor(fc, flds) as curs:
for row in curs:
archetype = row[0]
# calculate statistics
dct_cnts = GetStats(archetype, dct_flds)
# set row stats values
for fld_name, cnt in dct_cnts.items():
i = flds.index(fld_name)
row[i] = cnt
# write updated row
curs.updateRow(row)
def AddFields(fc, dct_flds):
# Add the fields mentioned in the dictionary
for description, fld_name in dct_flds.items():
if not FieldExists(fc, fld_name):
arcpy.AddField_management(fc, fld_name, "LONG")
def FieldExists(fc, fld_name):
# Check if field exists
if len(arcpy.ListFields(fc, fld_name)) == 0:
return False
else:
return True
def GetStats(archetype, dct_flds):
# get the stats from the archetype
dct_cnts = {}
archetypes = archetype.split(";")
for description in archetypes:
if description in dct_flds:
fld_name = dct_flds[description]
if fld_name in dct_cnts:
dct_cnts[fld_name] += 1
else:
dct_cnts[fld_name] = 1
else:
# oops, value not defined in dct_flds, report this
print('{} is not included in dct_flds'.format(description))
pass
return dct_cnts
if __name__ == '__main__':
main()
This script points on line 6 to the data you shared. I noticed that the featureclass did not have any output fields defined, so I added code to add the new fields if they don't exist. Please note that there are a large number of UNDEFINED values in the fields.
Below some statistics of what was found in the data:
Description | Count | Output Field name |
Large Multi-Storey Home built < 2007 | 272 | LMSHBLT2007 |
Large Multi-Storey Home built >= 2007 | 160 | LMSHBGE2007 |
Large Multiplex built < 2007 | 1708 | LMBLT2007 |
Large Multiplex built >= 2007 | 1548 | LMBGE2007 |
Large Retail | 60 | LargeRetail |
Large Schools | 12 | LargeSchools |
Large Single Storey Home built < 2007 | 2852 | LSSHBLT2007 |
Large Single Storey Home built >= 2007 | 44 | LSSHBGE2007 |
Medium Apartments Mixed | 40 | MediumAptMixed |
Medium Offices | 8 | MediumOffices |
Small Apartments Mixed | 72 | SmallAptMixed |
Small Multi-Storey Home built < 2007 | 1980 | SMSHMLT2007 |
Small Multi-Storey Home built >= 2007 | 424 | SMSHBGE2007 |
Small Multiplex built < 2007 | 2484 | SMBLT2007 |
Small Multiplex built >= 2007 | 392 | SMBGE2007 |
Small Offices | 20 | SmallOffices |
Small Retail | 132 | SmallRetail |
Small Schools | 28 | SmallSchools |
Small Single Storey Home built < 2007 | 6668 | SSSHBLT2007 |
Small Single Storey Home built >= 2007 | 4 | SSSHBGE2007 |
UNDEFINED | 552 | UNDEFINED |
Very Small Apartments Mixed | 224 | VerySmallAptMixed |
Let me know if you have any questions. I will try to send the FGDB by private message (if it fits)...
Hussain - thx for the tag. Agree with Xander that a sample of the data would be useful to figure this one out. I'll defer to Xander's snippet of code to solve this one. All the best!
See if this helps
from collections import Counter
def x(fc):
cur = arcpy.da.SearchCursor(fc, field_names="type")
for row in cur:
bb=Counter(row[0].split(";"))
for i in bb.keys():
arcpy.management.AddField(r'C:<path>\Hussain\Hussain.gdb\H', i, 'LONG')
with arcpy.da.UpdateCursor(fc, field_names=list(bb.keys())) as cursor:
for v in cursor:
cursor.updateRow(list(bb.values()))
Hi @SMehta ,
This for sharing the idea of using Collections::Counter, which is indeed a great way of splitting a text and get a dictionary with the counts. Kudos for that. However, in your code, I don't see you are handling the creation of the same field twice which I assume should fail and using the description for the field name would also fail since the descriptions are no valid field names.
Find below the part of the GetStats function which will now set values to 0 and avoids getting the Null values in the result:
def GetStats(archetype, dct_flds):
# get the stats from the archetype
dct_cnts = {}
for description, fld_name in dct_flds.items():
dct_cnts[fld_name] = 0
archetypes = archetype.split(";")
for description in archetypes:
if description in dct_flds:
fld_name = dct_flds[description]
if fld_name in dct_cnts:
dct_cnts[fld_name] += 1
else:
# this part is no longer required
dct_cnts[fld_name] = 1
else:
# oops, value not defined in dct_flds, report this
print('{} is not included in dct_flds'.format(description))
pass
return dct_cnts