Count Values in Concatenated field for each record !!!! HELP

619
8
Jump to solution
01-28-2021 02:41 PM
Labels (3)
HussainMalik1
New Contributor III

Hello All, 

@ShitijMehta @DanPatterson@XanderBakker , @Robert_LeClair  @Philip-Wilson 

I have a polygon feature class of buildings in the city, one of the fields in it is ARCHTYPE and it’s a multivalue concatenated field as shown in the image. (NOTE: There are total of 25 Archetype in my data)

So, I want to count the occurrence of each archetype (values are separated by semi colon) in a row and create 25 new fields with the archetype names and update those new fields with the count number.

So, for example Structure_ID 1400232 (2ND HIGHLIGHTED ROW) I would have 2 new fields called Small Multiplex<2007 and Large Multiplex built < 2007 It should give me value of 4 for and 1 respectively. 

The table below is something that i am trying to do counting the each value in Archetype Field and updating the preexisting fields with there respective numbers

 ARCHETYPESmall Multiplex built < 2007Large Multiplex built < 2007Archtyp3Archtyp4Archtyp5Archtyp6Archtyp7Archtyp8Archtyp9Archtyp10
Record 1Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Large Multiplex built < 2007;Large Multiplex built < 2007;Large Multiplex built < 2007;Small Multiplex built < 2007;Small Multiplex built < 2007;Large Multiplex built < 200710400000000

 

It would be a great help if someone help me find a way to do this within modelbuilder since this is one of the intermediately step in my main model so either a geoprocessing tool or a script would go a long way. 

Thanks in advance and looking forward for a response. 

Regards 

Hussain

 

0 Kudos
2 Solutions

Accepted Solutions
XanderBakker
Esri Esteemed Contributor

Hi @HussainMalik1 ,

 

See below the script that allows you to get the statistics:

import arcpy

def main():
    # read the input featureclass
    # fc = arcpy.GetParameterAsText(0)
    fc = r'D:\GeoNet\ExtractText\XanderBakker.gdb\Hussain_Data_ConcatenatedFieldValueCount'

    # input data field
    fld_name = 'ARCHETYPE'

    # define how the items relate to the fields
    dct_flds = {'Large Multi-Storey Home built < 2007': 'LMSHBLT2007',
                'Large Multi-Storey Home built >= 2007': 'LMSHBGE2007',
                'Large Multiplex built < 2007': 'LMBLT2007',
                'Large Multiplex built >= 2007': 'LMBGE2007',
                'Large Retail': 'LargeRetail',
                'Large Schools': 'LargeSchools',
                'Large Single Storey Home built < 2007': 'LSSHBLT2007',
                'Large Single Storey Home built >= 2007': 'LSSHBGE2007',
                'Medium Apartments Mixed': 'MediumAptMixed',
                'Medium Offices': 'MediumOffices',
                'Small Apartments Mixed': 'SmallAptMixed',
                'Small Multi-Storey Home built < 2007': 'SMSHMLT2007',
                'Small Multi-Storey Home built >= 2007': 'SMSHBGE2007',
                'Small Multiplex built < 2007': 'SMBLT2007',
                'Small Multiplex built >= 2007': 'SMBGE2007',
                'Small Offices': 'SmallOffices',
                'Small Retail': 'SmallRetail',
                'Small Schools': 'SmallSchools',
                'Small Single Storey Home built < 2007': 'SSSHBLT2007',
                'Small Single Storey Home built >= 2007': 'SSSHBGE2007',
                'UNDEFINED': 'UNDEFINED',
                'Very Small Apartments Mixed': 'VerySmallAptMixed'}

    # add fields if necessary
    addfields = True
    if addfields:
        AddFields(fc, dct_flds)

    # create list of fields
    flds = [fld_name]
    for fld in dct_flds.values():
        flds.append(fld)

    # loop through data
    with arcpy.da.UpdateCursor(fc, flds) as curs:
        for row in curs:
            archetype = row[0]
            # calculate statistics
            dct_cnts = GetStats(archetype, dct_flds)
            # set row stats values
            for fld_name, cnt in dct_cnts.items():
                i = flds.index(fld_name)
                row[i] = cnt
            # write updated row
            curs.updateRow(row)


def AddFields(fc, dct_flds):
    # Add the fields mentioned in the dictionary
    for description, fld_name in dct_flds.items():
        if not FieldExists(fc, fld_name):
            arcpy.AddField_management(fc, fld_name, "LONG")


def FieldExists(fc, fld_name):
    # Check if field exists
    if len(arcpy.ListFields(fc, fld_name)) == 0:
        return False
    else:
        return True


def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts


if __name__ == '__main__':
    main()

 

This script points on line 6 to the data you shared.  I noticed that the featureclass did not have any output fields defined, so I added code to add the new fields if they don't exist. Please note that there are a large number of UNDEFINED values in the fields.

Below some statistics of what was found in the data:

DescriptionCountOutput Field name
Large Multi-Storey Home built < 2007272LMSHBLT2007
Large Multi-Storey Home built >= 2007160LMSHBGE2007
Large Multiplex built < 20071708LMBLT2007
Large Multiplex built >= 20071548LMBGE2007
Large Retail60LargeRetail
Large Schools12LargeSchools
Large Single Storey Home built < 20072852LSSHBLT2007
Large Single Storey Home built >= 200744LSSHBGE2007
Medium Apartments Mixed40MediumAptMixed
Medium Offices8MediumOffices
Small Apartments Mixed72SmallAptMixed
Small Multi-Storey Home built < 20071980SMSHMLT2007
Small Multi-Storey Home built >= 2007424SMSHBGE2007
Small Multiplex built < 20072484SMBLT2007
Small Multiplex built >= 2007392SMBGE2007
Small Offices20SmallOffices
Small Retail132SmallRetail
Small Schools28SmallSchools
Small Single Storey Home built < 20076668SSSHBLT2007
Small Single Storey Home built >= 20074SSSHBGE2007
UNDEFINED552UNDEFINED
Very Small Apartments Mixed224VerySmallAptMixed

 

Let me know if you have any questions. I will try to send the FGDB by private message (if it fits)...

View solution in original post

XanderBakker
Esri Esteemed Contributor

Hi @Horizon ,

 

This for sharing the idea of using Collections::Counter, which is indeed a great way of splitting a text and get a dictionary with the counts. Kudos for that. However, in your code, I don't see you are handling the creation of the same field twice which I assume should fail and using the description for the field name would also fail since the descriptions are no valid field names. 

 

@HussainMalik1 ,

 

Find below the part of the GetStats function which will now set values to 0 and avoids getting the Null values in the result: 

def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    for description, fld_name in dct_flds.items():
        dct_cnts[fld_name] = 0
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                # this part is no longer required
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts

 

View solution in original post

8 Replies
XanderBakker
Esri Esteemed Contributor

Hi @HussainMalik1 ,

Would it be possible to share a sample of your data? This would help to write a snippet of code to show how you can do this.

HussainMalik1
New Contributor III

@XanderBakker  Thanks for getting back to me, really appreciate it, and yes i can share a sample dataset that you can download from my one drive, if you could share your email i could email you the link for my one drive. 

Thanks 

Hussain 

0 Kudos
XanderBakker
Esri Esteemed Contributor

Hi @HussainMalik1 ,

 

You can use a direct message here inside GeoNet or mail me using "xbakker [at] esri.co" (and yes it is "co" for Colombia).

0 Kudos
XanderBakker
Esri Esteemed Contributor

Hi @HussainMalik1 ,

 

See below the script that allows you to get the statistics:

import arcpy

def main():
    # read the input featureclass
    # fc = arcpy.GetParameterAsText(0)
    fc = r'D:\GeoNet\ExtractText\XanderBakker.gdb\Hussain_Data_ConcatenatedFieldValueCount'

    # input data field
    fld_name = 'ARCHETYPE'

    # define how the items relate to the fields
    dct_flds = {'Large Multi-Storey Home built < 2007': 'LMSHBLT2007',
                'Large Multi-Storey Home built >= 2007': 'LMSHBGE2007',
                'Large Multiplex built < 2007': 'LMBLT2007',
                'Large Multiplex built >= 2007': 'LMBGE2007',
                'Large Retail': 'LargeRetail',
                'Large Schools': 'LargeSchools',
                'Large Single Storey Home built < 2007': 'LSSHBLT2007',
                'Large Single Storey Home built >= 2007': 'LSSHBGE2007',
                'Medium Apartments Mixed': 'MediumAptMixed',
                'Medium Offices': 'MediumOffices',
                'Small Apartments Mixed': 'SmallAptMixed',
                'Small Multi-Storey Home built < 2007': 'SMSHMLT2007',
                'Small Multi-Storey Home built >= 2007': 'SMSHBGE2007',
                'Small Multiplex built < 2007': 'SMBLT2007',
                'Small Multiplex built >= 2007': 'SMBGE2007',
                'Small Offices': 'SmallOffices',
                'Small Retail': 'SmallRetail',
                'Small Schools': 'SmallSchools',
                'Small Single Storey Home built < 2007': 'SSSHBLT2007',
                'Small Single Storey Home built >= 2007': 'SSSHBGE2007',
                'UNDEFINED': 'UNDEFINED',
                'Very Small Apartments Mixed': 'VerySmallAptMixed'}

    # add fields if necessary
    addfields = True
    if addfields:
        AddFields(fc, dct_flds)

    # create list of fields
    flds = [fld_name]
    for fld in dct_flds.values():
        flds.append(fld)

    # loop through data
    with arcpy.da.UpdateCursor(fc, flds) as curs:
        for row in curs:
            archetype = row[0]
            # calculate statistics
            dct_cnts = GetStats(archetype, dct_flds)
            # set row stats values
            for fld_name, cnt in dct_cnts.items():
                i = flds.index(fld_name)
                row[i] = cnt
            # write updated row
            curs.updateRow(row)


def AddFields(fc, dct_flds):
    # Add the fields mentioned in the dictionary
    for description, fld_name in dct_flds.items():
        if not FieldExists(fc, fld_name):
            arcpy.AddField_management(fc, fld_name, "LONG")


def FieldExists(fc, fld_name):
    # Check if field exists
    if len(arcpy.ListFields(fc, fld_name)) == 0:
        return False
    else:
        return True


def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts


if __name__ == '__main__':
    main()

 

This script points on line 6 to the data you shared.  I noticed that the featureclass did not have any output fields defined, so I added code to add the new fields if they don't exist. Please note that there are a large number of UNDEFINED values in the fields.

Below some statistics of what was found in the data:

DescriptionCountOutput Field name
Large Multi-Storey Home built < 2007272LMSHBLT2007
Large Multi-Storey Home built >= 2007160LMSHBGE2007
Large Multiplex built < 20071708LMBLT2007
Large Multiplex built >= 20071548LMBGE2007
Large Retail60LargeRetail
Large Schools12LargeSchools
Large Single Storey Home built < 20072852LSSHBLT2007
Large Single Storey Home built >= 200744LSSHBGE2007
Medium Apartments Mixed40MediumAptMixed
Medium Offices8MediumOffices
Small Apartments Mixed72SmallAptMixed
Small Multi-Storey Home built < 20071980SMSHMLT2007
Small Multi-Storey Home built >= 2007424SMSHBGE2007
Small Multiplex built < 20072484SMBLT2007
Small Multiplex built >= 2007392SMBGE2007
Small Offices20SmallOffices
Small Retail132SmallRetail
Small Schools28SmallSchools
Small Single Storey Home built < 20076668SSSHBLT2007
Small Single Storey Home built >= 20074SSSHBGE2007
UNDEFINED552UNDEFINED
Very Small Apartments Mixed224VerySmallAptMixed

 

Let me know if you have any questions. I will try to send the FGDB by private message (if it fits)...

View solution in original post

Robert_LeClair
Esri Frequent Contributor

Hussain - thx for the tag.  Agree with Xander that a sample of the data would be useful to figure this one out.  I'll defer to Xander's snippet of code to solve this one.  All the best!

0 Kudos
Horizon
New Contributor

See if this helps

Horizon_0-1611972299912.png

 

from collections import Counter
def x(fc):
    cur = arcpy.da.SearchCursor(fc, field_names="type")
    for row in cur:
        bb=Counter(row[0].split(";"))
    for i in bb.keys():
        arcpy.management.AddField(r'C:<path>\Hussain\Hussain.gdb\H', i, 'LONG')
    with arcpy.da.UpdateCursor(fc, field_names=list(bb.keys())) as cursor:
	    for v in cursor:
		    cursor.updateRow(list(bb.values()))
		    

 

 

XanderBakker
Esri Esteemed Contributor

Hi @Horizon ,

 

This for sharing the idea of using Collections::Counter, which is indeed a great way of splitting a text and get a dictionary with the counts. Kudos for that. However, in your code, I don't see you are handling the creation of the same field twice which I assume should fail and using the description for the field name would also fail since the descriptions are no valid field names. 

 

@HussainMalik1 ,

 

Find below the part of the GetStats function which will now set values to 0 and avoids getting the Null values in the result: 

def GetStats(archetype, dct_flds):
    # get the stats from the archetype
    dct_cnts = {}
    for description, fld_name in dct_flds.items():
        dct_cnts[fld_name] = 0
    archetypes = archetype.split(";")
    for description in archetypes:
        if description in dct_flds:
            fld_name = dct_flds[description]
            if fld_name in dct_cnts:
                dct_cnts[fld_name] += 1
            else:
                # this part is no longer required
                dct_cnts[fld_name] = 1
        else:
            # oops, value not defined in dct_flds, report this
            print('{} is not included in dct_flds'.format(description))
            pass
    return dct_cnts

 

View solution in original post

HussainMalik1
New Contributor III

Hi @XanderBakker 

Thanks for your code it works perfectly now. 

Thanks

Hussain