Group Unique Values from Multiple Attribute Fields

1908
15
Jump to solution
06-17-2020 09:58 AM
JaredPilbeam2
MVP Regular Contributor

I'm working with a table with tons of unique values crammed into a field. Is there a pythonic way to group unique values from multiple attribute fields? That way I'd have one symbol for its respective values instead of there being repetitive values. 

So, i'd like the table to be organized into something like this:

I'm able to do something similar with Arcade, but it's limited.  How To: Group unique values from multiple attribute fields in ArcGIS Pro 

Creating a list in Arcade can be verified in the Expression builder, but there are no results. My thread using Arcade: Arcade If Statement for Multiple Unique Values 

0 Kudos
1 Solution

Accepted Solutions
XanderBakker
Esri Esteemed Contributor

Hi Jared Pilbeam ,

Find below the Python standalone script to do the same. 

#-------------------------------------------------------------------------------
# Name:        categories.py
# Purpose:
#
# Author:      xbakker
#
# Created:     24/06/2020
#-------------------------------------------------------------------------------

def main():
    import arcpy

    # configuration
    fc = r'C:\GeoNet\Jared\Default.gdb\RecyclingLocations_full'  # point to your datasource
    fld_kw = "USER_Keywo"  # your input field
    fld_res = "Category2"  # your output string field

    with arcpy.da.UpdateCursor(fc, (fld_kw, fld_res)) as curs:
        for row in curs:
            keyword_txt = row[0]
            finalgroup = GetPredominantGroup(keyword_txt)
            row[1] = finalgroup
            curs.updateRow(row)


def GetPredominantGroup(keyword_txt):
    sep = ";"
    keywords = keyword_txt.split(sep)
    dct_groups = {}
    for keyword in keywords:
        group = GetGroup(keyword)
        if not group is None:
            if group in dct_groups:
                dct_groups[group] += 1
            else:
                dct_groups[group] = 1

    cntmax = 0
    finalgroup = None
    for group, cnt in dct_groups.items():
        if cnt > cntmax:
            finalgroup = group
            cntmax = cnt

    return finalgroup


def GetGroup(keyword):
    groups = {'Appliance Recycling': 'Air Conditioner;Appliance;Refrigerator;Dishwasher;Dryer;Oven;Stove;Washer;Water Heater',
    'Asbestos Info / Removal Service': 'Asbestos',
    'Automobile & Boat Reuse Recycling': 'Airplane; Automobile;Boat;Car;Recreation Vehicle;RV;Truck',
    'Awards & Recognition': 'Awards (Eco-Award)',
    'Batteries': 'Lead-Acid Batteries;Single Use AA, AAA, C, D, 9-Volt;Batteries - Rechargeable, BackUp;Lead-Acid Batteries;Rechargebale;Disposable Batteries;Rechargeable;Automotive Batteries;Boat Batteries;Lithium Batteries;Button Batteries',
    'Bicycle Reuse Donation': 'Bicycles',
    'Book Recycling': 'Books - Hardcover Recycling;Books - Softcover Recycling',
    'Cellular Telephone Reuse & Recycling': 'Cell Phone - Retail Drop-Off;Cell Phone - Electronic Drop Off',
    'Clean - Up Services': 'Clean-Up of Properties;',
    'Clothing, Linens & Shoe Reuse & Recycling-Drop-Off Locations': 'Clothing;Gym Shoes / Sneakers / HighTops;Linens / Blankets / Tableclothes /  Towels;Textile / Fabrics / Drapes / Curtains;Shoes / Boots;Purse or Handbag',
    'Commercial Hazardous Waste': 'Hazardous Waste Services;Chemical Business Waste',
    'Composting Facilities/Landscape Waste transfer Stations': 'Composting Facilities/Landscape Waste transfer Stations;Landscape Material Recycling',
    'Computer Reuse & Recycling': 'Business Electronic Devices',
    'Construction & Demolition Recycling': 'Asphalt;Brick;Building Construction & Demolition Debris;Carpet;Carpet Padding;Concrete;Dirt;Drywall;Gravel;Padding;Roofing Materials;Shingles;Soil;Stone;Vinyl Siding;Windows;Wood-Untreated;Pallets',
    'Cooking Oil/Grease Recycling & Trap Service': 'Cooking Oil;Grease Recycling;Grease Trap Services',
    'Crayon Recycling': 'Crayons',
    'Curbside Recycling': 'Aerosol Cans (empty) Curbside;Aluminum Can Curbside;Aluminum Foil Curbside;Cardboard & Chipboard Curbside;Curbside Recycling;Glass Bottles & Jars Curbside;Juice Boxes/Drink Pouches Curbside;Paper (magazines, phone bks, etc) Curbside;Plastic #1 (PET) & #2 (HDPE) Containers;Plastic #3, #4, #5, #7, Containers Curbside;Six-Pack Rings Curbside;Tin / Steel Cans (soup,cofffee, etc) Curbside',
    'Demolition Material Recycling & Disposal': 'Construction Material Reuse;Demolition Material Reuse / Recycling',
    'Disposable Batteries': 'Batteries - Single Use AA, AAA, C, D, 9-Volt',
    'Donation': 'Books - Hardcover;Computer Reuse/Donation;Couch;Sofa;Toys',
    'Drum & Cylinder Disposal or Recycling': 'Cylinders;Drums',
    'Dry Cleaners (Green)': 'Dry Cleaning',
    'Electronics': 'Adding Machines;Answering Machines;Calculators;Cameras;Cassettes;CB\'s/Two-way radios;CD Players / Laser Disc Players;CD ROM Drives;CDs, DVDs;Cell Phone - Electronic Drop-Off;Computer Recycling;Computer Recycling Drop-Off;Copy Machine;Cords & Cables;Digital Clocks;DVD Machine;Electronic Mice;Electronics;Fax Machines;Floppy Disks;Hand Held Games;Hard Drives;Joysticks/Game controls;Keyboards;Microwaves;Modems;Monitors;Pagers;Palm Organizers;Paper shredders;Portable Radio; Postage Machines;Printers;Scanner Machines;Speakers/Stereo Systems;String Lights /  Holiday String Lights;Sump Pump;Tape Drives;Telephone;Televisions;Thermometers (digital);Typewriters/Word Processors;UPS Battery Backups;VCR Machine;VHS Tapes;Video Game Players; Zip Drives',
    'Eye Glass Reuse': 'Eye Glasses',
    'Fire Extinguisher Refilling or Recycling': 'Fire Extinguishers',
    'Fluorescent Lights': 'CFLs;Compact Fluorescent Light Bulbs;Fluorescent Lights',
    'Food Donation': 'Food Donation',
    'Furniture & Office Equipment ': 'Book Cases;Cubical Walls;Desks;File Cabinets;Furniture & Office Equipment;Office Chairs',
    'Geothermal Energy': 'Geothermal',
    'Gift Cards': 'Gift Cards (merchant plastic money card)',
    'Green Energy Supplier': 'Energy Supplier',
    'Grocery Bag': 'Grocery Bags',
    'Heating Oil Tank Removal': 'Heating Oil Tank Removal',
    'Household Hazardous Waste (HHW)': 'Aerosol Products;Antifreeze;Automotive Batteries;Automotive Fluid;Boat Batteries;Button Batteries;Cleaning Chemicals;Compact Fluorescent Light Bulbs - HHW Drop-Off;Drain Cleaners;Driveway Sealer;Fertilizers;Fluorescent Light Bulbs;Gasoline-Oil Mix;Hazardous Household Materials;HHW;Household Chemicals;Lawn & garden chemicals;Lithium Batteries;Medication (old or unwanted);Mercury / Mercury Products;Motor Oil;Nail Polish Remover;Oil Filters;Oil-based Paint / Stain / Varnish;Old Gasoline;Paint Thinners;Pesticides;Pool Chemicals;Rechargeable Batteries;Solvents;Thermometers (glass);Used Oil;Medication (old or unwanted);Thermometers (glass);Cough Medicine;Inhalers;Medicated Shampoo;Ointment / Medicated Ointment;Over-the-Counter Medication;Pills, Pharmaceuticals, Medication;Prescription Medication;Sunscreen;Vitamins',
    'Landfill & Transfer Stations': 'Landfills & Transfer Stations;Beverage Carrier Straps;Six-Pack Rings;Plastic #1, PET;Plastic #2, HDPE;Plastic #3, PVC;Plastic #4, LDPE;Plastic #5, PP;Plastic #7, Other Container;Aerosol Cans (empty) Drop-Off;Aluminum Can Recycling Drop-Off ;Aluminum Foil Recycling Drop-Off;Glass Bottles and Jars Drop-Off;Juice Boxes / Drink Pouches  Drop-Off;Tin / Steel Cans (soup, coffee, etc) Drop-Off;',
    'Latex Paint': 'Latex Paint',
    'Lead-base Paint Concerns': 'Lead Paint',
    'Manure': 'Manure',
    'Matress Recycling': 'Matress & Box Springs',
    'Medical Equipment (durable)': 'Canes;Crutches;Portable Toliets;Shower Chairs;Wheelchairs',
    'Medication Drop-Off': 'Cough Medicine;Inhalers;Medicated Shampoo;Ointment / Medicated Ointment;Over-the-Counter Medication;Pills, Pharmaceuticals,Medication;Prescription Medication;Sunscreen;Vitamins',
    'Metal Recycling': 'Scrap Metal;Aluminum / Aluminum Can Scrap;Brass;Copper;Iron;Steel Scrap;',
    'Motor Oil - Bulk': 'Motor Oil in Buk Drums',
    'Motor Oil Recycling (DIY)': 'Motor Oil - DIY',
    'Packaging Materials': 'Packing Peanut ;Styrofoam Peanuts;Cardboard Boxes',
    'Pallet Reuse or Recycling': 'Pallets',
    'Paper Recycling': 'Cardboard;Chipboard;Colored Paper;Construction Paper;Junk Mail;Magazines;Newspaper;Office Paper;Paper Recycling;Shredding Services;Clored Paper',
    'Plastic Carrier Strap Recycling': 'Beverage Carrier Straps;Six-Pack Rings',
    'Plastic Container Recycling': 'Plastic #1, PET;Plastic #2, HDPE;Plastic #3 PVC;Plastic #4, LDPE;Plastic #5, PP;Plastic #7, Other Container',
    'Pop-Tab Recycling': 'Pop-Tab',
    'Printer/Toner Cartridge & Ribbon Recycling': 'Inkjets (Ribbons);Printer/Toner Cartridges',
    'Propane Tank Disposal or Recycling': 'Propane Tanks',
    'Rechargeable Battery': 'Batteries - Rechargeable, BackUp, Lead-Acid;Rechargeable',
    'Recycling and Waste Collection Companies': 'Recycling Collection Company;Garbage Collection Company;Waste Collection Company',
    'Repair Services': 'Sharpening Services',
    'Reuse & Resale Shops': 'Reuse/Resale',
    'Smoke Detectors ': 'Smoke Detectors ',
    'Solar Energy': 'Solar Energy',
    'Styrofoam or Polystyrene Recycling': 'Plastic #6, PS, Polystyrene;Styrofoam',
    'Tire Recycling': 'Tire',
    'Water Conservation': 'Rain Barrels;Rain Garden',
    'Wind Energy': 'Small Wind'}

    for group, keywords in groups.items():
        if keywords.find(keyword) > -1:
            return group

    return None


if __name__ == '__main__':
    main()

Although it does exactly the same as the Arcade expression, it may result in different results. Isn't that fun? 

The reason for this is that when you loop through a dictionary in Python it will not be sorted and the Arcade object will. So if you have two groups with the same amount of counts, it may return a different group. I still think that it might requiere some additional thinking on how to treat features that could belong to multiple groups.

View solution in original post

15 Replies
XanderBakker
Esri Esteemed Contributor

Hi jpilbeam ,

From the subset of data is is hard to suggest a way to obtain the result. From what I see you can take the first word from he USER_categ field, but since I can only see the first part, it is unclear if this will work.

0 Kudos
JaredPilbeam2
MVP Regular Contributor

Xander Bakker‌,

Thanks for the reply. The whole table has 600+ records. Here's a screenshot of both fields.

To reiterate, is there a pythonic way to group these features like so, for example? 

0 Kudos
WillHouston
New Contributor III

Are you doing that manually in the USER_Categ field, or is that something else? That field looks like it has many fewer unique values.

0 Kudos
JaredPilbeam2
MVP Regular Contributor

Well, that's just a graphic I did in Inkscape meant to visualize how I'd ultimately like the features to be grouped. I'm looking for a way to group both the USER_Keywo and USER_Categ fields that have similar unique values using Python. Because right now when I turn on labels I have to (1) choose one of the two fields and (2) there are way too many labels. Make sense?

0 Kudos
WillHouston
New Contributor III

I'm still not sure it makes sense to me. The USER_Categ field, to me, looks like what you want to use as the field for your "Unique Values" primary symbology. In your second graphic above where you have the four symbols (Appliances, Asbestos, Airplane, Building Material), you are pointing them one-to-one to values in the USER_Categ field. In which case, I would say to use USER_Categ as the field for your unique symbology, then manually change the label associated with each value to the simplified label ("Appliances" instead of "Appliance Recycling").

If you have too many values to do that manually, Arcade would be the quicker way. I will respond to your post there.

0 Kudos
XanderBakker
Esri Esteemed Contributor

Hi Jared Pilbeam ,

Is it possible to share the data? This will make it a lot easier to create the expression that matches the entire data.

0 Kudos
JaredPilbeam2
MVP Regular Contributor

Xander Bakker‌,

Here's a fgdb: https://webapp.willcountyillinois.com/Requests/Recycle.zip (look for the USER_Keywo and USER_Categ fields) Thanks for taking a look at it. And so we're on the same page the thread I have going related to this, but with Arcade, is here: Arcade If Statement for Multiple Unique Values 

0 Kudos
XanderBakker
Esri Esteemed Contributor

Hi Jared Pilbeam ,

I just had a look at the data you shared and I think there is still some more information required to create the expression or Python script to group the values together. At this moment I am looking at 65 unique categories, 265 unique keywords (when I split the values in the field on the ";" character) and there 192 unique combinations between the two fields.

So the main question is; what are the groups that you want to obtain and what defines if an element belongs to a group?  

If I look at the example you shared before:

The first group "Appliances" contains this word in both the keywords as in the categories fields, for the second group "Asbestos" the same applies, for the thirds group "Airplane" it only occurs in the keywords and the fourth group "Building Material" has "Building" somewhere in the keywords and not in the categories.

Below the entire list of categories in your data:

Appliance Recycling
Asbestos Info / Removal Service
Automobile and Boat Reuse Recycling
Awards and Recognition
Batteries
Bicycle Reuse-Donation
Book Recycling
Cellular Telephone Reuse & Recycling
Clean-Up Services
Clothing, Linens & Shoe Reuse & Recycling-Drop-Off Locations
Commercial Hazardous Waste
Composting
Composting Facilities/Landscape Waste Transfer Stations
Computer Reuse & Recycling
Construction & Demolition Recycling
Cooking Oil/Grease Recycling & Trap Service
Crayon Recycling
Curbside Recycling
Document Destruction/Shredding
Donation
Drum & Cylinder Disposal or Recycling
Dry Cleaners (Green)
Electronics
Eye Glass Reuse
Fire Extinguisher Refilling or Recycling
Fluorescent Lights
Food Donation
Geothermal Energy
Gift Cards
Glass Recycling Processors
Green Energy Supplier
Green Roofing Firms
Grocery Bag
Hazardous Waste Disposal or Recycling
Heating Oil Tank Removal
Household Hazardous Waste (HHW)
Landfill & Transfer Stations
Lead-based Paint Concerns
Manure
Mattress Recycling
Medical Waste - Sharps or Needles
Metal Recycling
Motor Oil - Bulk
Motor Oil Recycling (DIY)
Packaging Materials
Pallet Reuse or Recycling
Paper Recycling
Plastic Carrier Strap Recycling
Plastic Container Recycling
Plastic Recycling
Pop-Tab Recycling
Printer/Toner Cartridge & Ribbon Recycling
Propane Tank Disposal or Recycling
Rechargeable Battery
Recycling Equipment
Recycling Processing Facilities
Recycling and Waste Collection Companies
Repair Services
Reuse and Resale Shops
Smoke Detectors
Solar Energy
Styrofoam or Polystyrene Recycling
Tire Recycling
Water Conservation
Wind Energy

... and the list of keywords (where I notice many similar keywords and some seem to be cut off):

Adding Machines
Aerosol Cans (empty) Curbside
Aerosol Cans (empty) Drop-Off
Aerosol Products
Air Conditioner
Airplane
Alu
Aluminum / Aluminum Can Scrap
Aluminum Can Curbside
Aluminum Can Recycling Drop-Off
Aluminum Foil
Aluminum Foil Curbside
Aluminum Foil Rec
Aluminum Foil Recycli
Aluminum Foil Recycling Drop-Off
Answering Machines
Antifreeze
Appliance
Asbestos
Asphalt
Automobile
Automotive Batteries
Automotive Fluid
Awards (Eco-Award)
Batteries - Rechargeable, BackUp, Lead-Acid
Batteries - Single Use AA, AAA, C, D, 9-Volt
Beverage Carrier Straps
Bicycles
Bo
Boat
Boat Batteries
Book Cases
Books - Hardcover
Books - Hardcover Recycling
Books - Softcover Recycling
Brass
Brick
Building Construction & Demolition Debris
Business Electronic Devices
Button Batteries
CB's/Two -way radios
CD Players / Laser Disc Players
CD ROM Drives
CDs, DVDs
CFLs
Calculators
Cameras
Car
Cardboard
Cardboard & Chipboard Curbside
Cardboard Boxes
Carpet
Carpet Padding
Cassettes
Cell Phone - Electronic Drop-Off
Cell Phone - Retail Drop-Off
Chemical Business Waste
Chipboard
Clean-Up of Properties
Cleaning Chemicals
Clothing
Co
Colored Paper
Compact Fluorescent
Compact Fluorescent Light Bulbs
Compact Fluorescent Light Bulbs - HHW Drop-Off
Compost Facilities/Landscape Waste Transfer Stations
Composting, Vermi (worms)
Compute
Computer Recycling
Computer Recycling Drop-Off
Computer Reuse/Donation
Concrete
Construction Material Reuse
Construction Paper
Cooking Oil
Copper
Copy Machine
Copy Machines
Cords & Cables
Cou
Couch
Cough Medicine
Crayons
Cubical Walls
Curbside Recycling
Cylinders
DVD Machine
Dehumidifier
Demolition Material Reuse / Recycling
Desks
Di
Dig
Digital Cloc
Digital Clocks
Dirt
Dishwasher
Drain Cleaners
Driveway Sealer
Drums
Dry Cleaning
Dryer
Drywall
El
Electronic Mice
Electronics
Energy Supplier
Eye Glasses
Fax Machines
Fertilizers
File Cabinets
Fire Extinguishers
Floppy Disks
Fluorescent Light Bulbs
Fluorescent Lights
Food Donation
Furniture & Office Equipment
Garbage Collection Company
Gasoline-Oil Mix
Geothermal
Gift Cards (merchant plastic money card)
Glass Bottles and Jars Curbside
Glass Bottles and Jars Drop-Off
Glass Recycling
Gravel
Grease Recycling
Grease Trap Services
Grocery Bags
Gym Shoes / Sneakers / HighTops
HHW
Ha
Hand Held Games
Hard Drives
Hazard
Hazardous Household Materials
Hazardous Waste Services
Heating Oil Tank Removal
Household Chemicals
Inhalers
Inkjets (Ribbons)
Iron
Joysticks/Game controls
Juice Boxes / Drink Pouches Drop-Off
Juice Boxes / Drink Pouches Curbside
Junk Mail
Keyboards
Landfills & Transfer Stations
Landscape Material Recycling
Latex Paint
Lead Paint
Lead-Acid Batteries
Linens / Blankets / Tableclothes / Towels
Lithium Batteries
Magazines
Manure
Mattress and Box Springs
Medical Waste
Medicated Shampoo
Medication (old or unwanted)
Microwaves
Modems
Monitors
Motor Oil
Motor Oil - DIY
Motor Oil in Bulk Drums
Needles
Newspaper
Office Chairs
Office Paper
Oil-based Paint / Stain / Varnish
Ointment / Medicated Ointment
Oven
Over-the-Counter Medication
Packing Peanut
Padding
Pagers
Palle
Pallets
Palm Organizers
Paper (magazines, phone bks, etc
Paper (magazines, phone bks, etc) Curbside
Paper Recycling
Paper shr
Pills, Pharmaceuticals, Medication
Plasti
Plastic #1 (PET)
Plastic #1 (PET) & #2 (HDPE) Containers
Plastic #1, PET
Plastic #2, HDPE
Plastic #3, PVC
Plastic #4, LDPE
Plastic #5, PP
Plastic #6, PS, Polystyrene
Plastic #7, Other Cont
Plastic #7, Other Container
Plastics
Pop-Tab
Prescription Medication
Printer/Toner Cartridges
Printers
Propane Tanks
Purse or Handbag
RV
Rain Barrels
Rain Garden
Rechargeable
Rechargeable Batteries
Recreation Vehicle
Recycling Collection Company
Recycling Equipment
Recycling Processing Facilities
Refrigerator
Reuse/Resale
Roofing Materials
Roofing- Green
S
Scanner Machines
Scrap Metal
Sharpening Services
Sharps
Shingles
Shoes / Boots
Shredding
Shredding Services
Six-Pack Rings
Small Wind
Smoke Detectors
Sofa
Soil
Solar Energy
Speakers/Stereo Systems
Steel Scrap
Stone
Stove
String Lights / Holiday String Lights
Styrofoam
Styrofoam Peanuts
Sump Pump
Sunscreen
Tape Drives
Telephone
Televisions
Textiles / Fabrics / Drapes / Curtains
Thermometers (digital)
Thermometers (glass)
Tin / Steel Ca
Tin / Steel Can
Tin / Steel Cans (soup, coffee, etc) Drop-Off
Tire
Toys
Truck
Typewriters/Word Processors
Used Oil
VCR Machine
VHS Tapes
Video Game Players
Vinyl Siding
Vitamins
Washer
Waste Collection Company
Water Heater
Windows
Wood-Untreated

So, the main question remains, what are the final groups and what rules do you want to apply to assign a value to a group? If that can be resolved, we can create a script or expression.

JaredPilbeam2
MVP Regular Contributor

Xander Bakker‌,

So, the main question remains, what are the final groups and what rules do you want to apply to assign a value to a group?

The final groups are in the zip folder: https://webapp.willcountyillinois.com/Requests/Recycle.zip

I cleaned the two fields up from the feature layer and saved them in the new CSV (finalgroups.csv in zip folder). As for a rule, can the groups be created using the Keywords?

After all that, I might as well create a whole new feature layer (this data was given to me-- the reason why it wasn't too organized for GIS). But, I would still like to know how to group these programmatically.

0 Kudos