Count Values in Concatenated field for each record !!!! HELP

HussainMalik1 · ‎01-28-2021

Hello All,

@ShitijMehta @DanPatterson, @XanderBakker , @Robert_LeClair @Anonymous User

I have a polygon feature class of buildings in the city, one of the fields in it is ARCHTYPE and it’s a multivalue concatenated field as shown in the image. (NOTE: There are total of 25 Archetype in my data)

So, I want to count the occurrence of each archetype (values are separated by semi colon) in a row and create 25 new fields with the archetype names and update those new fields with the count number.

So, for example Structure_ID 1400232 (2ND HIGHLIGHTED ROW) I would have 2 new fields called Small Multiplex<2007 and Large Multiplex built < 2007 It should give me value of 4 for and 1 respectively.

The table below is something that i am trying to do counting the each value in Archetype Field and updating the preexisting fields with there respective numbers

	ARCHETYPE	Small Multiplex built < 2007	Large Multiplex built < 2007	Archtyp3	Archtyp4	Archtyp5	Archtyp6	Archtyp7	Archtyp8	Archtyp9	Archtyp10
Record 1	Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Large Multiplex built < 2007;Large Multiplex built < 2007;Large Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Large Multiplex built < 2007	10	4	0	0	0	0	0	0	0	0

It would be a great help if someone help me find a way to do this within modelbuilder since this is one of the intermediately step in my main model so either a geoprocessing tool or a script would go a long way.

Thanks in advance and looking forward for a response.

Regards

Hussain

XanderBakker · ‎01-29-2021

Hi @HussainMalik1 ,

See below the script that allows you to get the statistics:

import arcpy

def main():
    # read the input featureclass
    # fc = arcpy.GetParameterAsText(0)
    fc = r'D:\GeoNet\ExtractText\XanderBakker.gdb\Hussain_Data_ConcatenatedFieldValueCount'

    # input data field
    fld_name = 'ARCHETYPE'

    # define how the items relate to the fields
    dct_flds = {'Large Multi-Storey Home built < 2007': 'LMSHBLT2007',
                'Large Multi-Storey Home built >= 2007': 'LMSHBGE2007',
                'Large Multiplex built < 2007': 'LMBLT2007',
                'Large Multiplex built >= 2007': 'LMBGE2007',
                'Large Retail': 'LargeRetail',
                'Large Schools': 'LargeSchools',
                'Large Single Storey Home built < 2007': 'LSSHBLT2007',
                'Large Single Storey Home built >= 2007': 'LSSHBGE2007',
                'Medium Apartments Mixed': 'MediumAptMixed',
                'Medium Offices': 'MediumOffices',
                'Small Apartments Mixed': 'SmallAptMixed',
                'Small Multi-Storey Home built < 2007': 'SMSHMLT2007',
                'Small Multi-Storey Home built >= 2007': 'SMSHBGE2007',
                'Small Multiplex built < 2007': 'SMBLT2007',
                'Small Multiplex built >= 2007': 'SMBGE2007',
                'Small Offices': 'SmallOffices',
                'Small Retail': 'SmallRetail',
                'Small Schools': 'SmallSchools',
                'Small Single Storey Home built < 2007': 'SSSHBLT2007',
                'Small Single Storey Home built >= 2007': 'SSSHBGE2007',
                'UNDEFINED': 'UNDEFINED',
                'Very Small Apartments Mixed': 'VerySmallAptMixed'}

    # add fields if necessary
    addfields = True
    if addfields:
        AddFields(fc, dct_flds)

    # create list of fields
    flds = [fld_name]
    for fld in dct_flds.values():
        flds.append(fld)

    # loop through data
    with arcpy.da.UpdateCursor(fc, flds) as curs:
        for row in curs:
            archetype = row[0]
            # calculate statistics
            dct_cnts = GetStats(archetype, dct_flds)
            # set row stats values
            for fld_name, cnt in dct_cnts.items():
                i = flds.index(fld_name)
                row[i] = cnt
            # write updated row
            curs.updateRow(row)


def AddFields(fc, dct_flds):
    # Add the fields mentioned in the dictionary
    for description, fld_name in dct_flds.items():
        if not FieldExists(fc, fld_name):
            arcpy.AddField_management(fc, fld_name, "LONG")


def FieldExists(fc, fld_name):
    # Check if field exists
    if len(arcpy.ListFields(fc, fld_name)) == 0:
        return False
    else:
        return True


def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts


if __name__ == '__main__':
    main()

This script points on line 6 to the data you shared. I noticed that the featureclass did not have any output fields defined, so I added code to add the new fields if they don't exist. Please note that there are a large number of UNDEFINED values in the fields.

Below some statistics of what was found in the data:

Description	Count	Output Field name
Large Multi-Storey Home built < 2007	272	LMSHBLT2007
Large Multi-Storey Home built >= 2007	160	LMSHBGE2007
Large Multiplex built < 2007	1708	LMBLT2007
Large Multiplex built >= 2007	1548	LMBGE2007
Large Retail	60	LargeRetail
Large Schools	12	LargeSchools
Large Single Storey Home built < 2007	2852	LSSHBLT2007
Large Single Storey Home built >= 2007	44	LSSHBGE2007
Medium Apartments Mixed	40	MediumAptMixed
Medium Offices	8	MediumOffices
Small Apartments Mixed	72	SmallAptMixed
Small Multi-Storey Home built < 2007	1980	SMSHMLT2007
Small Multi-Storey Home built >= 2007	424	SMSHBGE2007
Small Multiplex built < 2007	2484	SMBLT2007
Small Multiplex built >= 2007	392	SMBGE2007
Small Offices	20	SmallOffices
Small Retail	132	SmallRetail
Small Schools	28	SmallSchools
Small Single Storey Home built < 2007	6668	SSSHBLT2007
Small Single Storey Home built >= 2007	4	SSSHBGE2007
UNDEFINED	552	UNDEFINED
Very Small Apartments Mixed	224	VerySmallAptMixed

Let me know if you have any questions. I will try to send the FGDB by private message (if it fits)...

View solution in original post

XanderBakker · ‎02-01-2021

Hi @SMehta ,

This for sharing the idea of using Collections::Counter, which is indeed a great way of splitting a text and get a dictionary with the counts. Kudos for that. However, in your code, I don't see you are handling the creation of the same field twice which I assume should fail and using the description for the field name would also fail since the descriptions are no valid field names.

@HussainMalik1 ,

Find below the part of the GetStats function which will now set values to 0 and avoids getting the Null values in the result:

def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    for description, fld_name in dct_flds.items():
        dct_cnts[fld_name] = 0
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                # this part is no longer required
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts

View solution in original post

XanderBakker · ‎01-29-2021

Hi @HussainMalik1 ,

Would it be possible to share a sample of your data? This would help to write a snippet of code to show how you can do this.

HussainMalik1 · ‎01-29-2021

@XanderBakker Thanks for getting back to me, really appreciate it, and yes i can share a sample dataset that you can download from my one drive, if you could share your email i could email you the link for my one drive.

Thanks

Hussain

XanderBakker · ‎01-29-2021

Hi @HussainMalik1 ,

You can use a direct message here inside GeoNet or mail me using "xbakker [at] esri.co" (and yes it is "co" for Colombia).

XanderBakker · ‎01-29-2021

Hi @HussainMalik1 ,

See below the script that allows you to get the statistics:

import arcpy

def main():
    # read the input featureclass
    # fc = arcpy.GetParameterAsText(0)
    fc = r'D:\GeoNet\ExtractText\XanderBakker.gdb\Hussain_Data_ConcatenatedFieldValueCount'

    # input data field
    fld_name = 'ARCHETYPE'

    # define how the items relate to the fields
    dct_flds = {'Large Multi-Storey Home built < 2007': 'LMSHBLT2007',
                'Large Multi-Storey Home built >= 2007': 'LMSHBGE2007',
                'Large Multiplex built < 2007': 'LMBLT2007',
                'Large Multiplex built >= 2007': 'LMBGE2007',
                'Large Retail': 'LargeRetail',
                'Large Schools': 'LargeSchools',
                'Large Single Storey Home built < 2007': 'LSSHBLT2007',
                'Large Single Storey Home built >= 2007': 'LSSHBGE2007',
                'Medium Apartments Mixed': 'MediumAptMixed',
                'Medium Offices': 'MediumOffices',
                'Small Apartments Mixed': 'SmallAptMixed',
                'Small Multi-Storey Home built < 2007': 'SMSHMLT2007',
                'Small Multi-Storey Home built >= 2007': 'SMSHBGE2007',
                'Small Multiplex built < 2007': 'SMBLT2007',
                'Small Multiplex built >= 2007': 'SMBGE2007',
                'Small Offices': 'SmallOffices',
                'Small Retail': 'SmallRetail',
                'Small Schools': 'SmallSchools',
                'Small Single Storey Home built < 2007': 'SSSHBLT2007',
                'Small Single Storey Home built >= 2007': 'SSSHBGE2007',
                'UNDEFINED': 'UNDEFINED',
                'Very Small Apartments Mixed': 'VerySmallAptMixed'}

    # add fields if necessary
    addfields = True
    if addfields:
        AddFields(fc, dct_flds)

    # create list of fields
    flds = [fld_name]
    for fld in dct_flds.values():
        flds.append(fld)

    # loop through data
    with arcpy.da.UpdateCursor(fc, flds) as curs:
        for row in curs:
            archetype = row[0]
            # calculate statistics
            dct_cnts = GetStats(archetype, dct_flds)
            # set row stats values
            for fld_name, cnt in dct_cnts.items():
                i = flds.index(fld_name)
                row[i] = cnt
            # write updated row
            curs.updateRow(row)


def AddFields(fc, dct_flds):
    # Add the fields mentioned in the dictionary
    for description, fld_name in dct_flds.items():
        if not FieldExists(fc, fld_name):
            arcpy.AddField_management(fc, fld_name, "LONG")


def FieldExists(fc, fld_name):
    # Check if field exists
    if len(arcpy.ListFields(fc, fld_name)) == 0:
        return False
    else:
        return True


def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts


if __name__ == '__main__':
    main()

This script points on line 6 to the data you shared. I noticed that the featureclass did not have any output fields defined, so I added code to add the new fields if they don't exist. Please note that there are a large number of UNDEFINED values in the fields.

Below some statistics of what was found in the data:

Description	Count	Output Field name
Large Multi-Storey Home built < 2007	272	LMSHBLT2007
Large Multi-Storey Home built >= 2007	160	LMSHBGE2007
Large Multiplex built < 2007	1708	LMBLT2007
Large Multiplex built >= 2007	1548	LMBGE2007
Large Retail	60	LargeRetail
Large Schools	12	LargeSchools
Large Single Storey Home built < 2007	2852	LSSHBLT2007
Large Single Storey Home built >= 2007	44	LSSHBGE2007
Medium Apartments Mixed	40	MediumAptMixed
Medium Offices	8	MediumOffices
Small Apartments Mixed	72	SmallAptMixed
Small Multi-Storey Home built < 2007	1980	SMSHMLT2007
Small Multi-Storey Home built >= 2007	424	SMSHBGE2007
Small Multiplex built < 2007	2484	SMBLT2007
Small Multiplex built >= 2007	392	SMBGE2007
Small Offices	20	SmallOffices
Small Retail	132	SmallRetail
Small Schools	28	SmallSchools
Small Single Storey Home built < 2007	6668	SSSHBLT2007
Small Single Storey Home built >= 2007	4	SSSHBGE2007
UNDEFINED	552	UNDEFINED
Very Small Apartments Mixed	224	VerySmallAptMixed

Let me know if you have any questions. I will try to send the FGDB by private message (if it fits)...

Robert_LeClair · ‎01-29-2021

Hussain - thx for the tag. Agree with Xander that a sample of the data would be useful to figure this one out. I'll defer to Xander's snippet of code to solve this one. All the best!

SMehta · ‎01-29-2021

See if this helps

from collections import Counter
def x(fc):
    cur = arcpy.da.SearchCursor(fc, field_names="type")
    for row in cur:
        bb=Counter(row[0].split(";"))
    for i in bb.keys():
        arcpy.management.AddField(r'C:<path>\Hussain\Hussain.gdb\H', i, 'LONG')
    with arcpy.da.UpdateCursor(fc, field_names=list(bb.keys())) as cursor:
	    for v in cursor:
		    cursor.updateRow(list(bb.values()))

XanderBakker · ‎02-01-2021

Hi @SMehta ,

This for sharing the idea of using Collections::Counter, which is indeed a great way of splitting a text and get a dictionary with the counts. Kudos for that. However, in your code, I don't see you are handling the creation of the same field twice which I assume should fail and using the description for the field name would also fail since the descriptions are no valid field names.

@HussainMalik1 ,

Find below the part of the GetStats function which will now set values to 0 and avoids getting the Null values in the result:

def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    for description, fld_name in dct_flds.items():
        dct_cnts[fld_name] = 0
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                # this part is no longer required
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts

HussainMalik1 · ‎02-01-2021

Hi @XanderBakker

Thanks for your code it works perfectly now.

Thanks

Hussain