Removing duplicates from a dictionary

JamesCrandall · ‎08-26-2016

Hi all!

I'm in need of assistance in removing duplicate values within a single dictionary as I'm not having any success with the ideas/examples I've found.

I need to identify and remove any duplicate "asset" within this dicitionary:

{
"Features": [
  [
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344F",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344F",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "S145",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA2B",
     "type": "structure",
     "isActive": 1
    }
   }
  ]
]
}

JamesCrandall · ‎08-26-2016

I think this does it. Would like to get validation though!

Thanks for looking.

    unique_data = []
    for d in JSONlist2:
        data_exists = False
        for ud in unique_data:
            if ud['asset'] == d['asset']:
              data_exists = True
              break
        if not data_exists:
            unique_data.append(d)

    data2['Features'] = unique_data

View solution in original post

JamesCrandall · ‎08-26-2016

I think this does it. Would like to get validation though!

Thanks for looking.

    unique_data = []
    for d in JSONlist2:
        data_exists = False
        for ud in unique_data:
            if ud['asset'] == d['asset']:
              data_exists = True
              break
        if not data_exists:
            unique_data.append(d)

    data2['Features'] = unique_data

DarrenWiens2 · ‎08-26-2016

That's prettier than what I came up with (basically, muscle dictionary into a set, then back to dictionary):

... my_set = set([])
... for asset in dict['Features'][0]:
...     for k1,v1 in asset.iteritems():
...         new_list = []
...         for k2,v2 in v1.iteritems():
...             new_list.append(str(k2) + ':' + str(v2)) # add key/value to a list
...         new_string = ','.join(new_list) # convert list to string
...         my_set.add(new_string) # add string to set
... new_dict = dict
... new_dict['Features'][0] = []
... for asset in my_set: # convert set back to dictionary
...     new_dict['Features'][0].append({'asset':{i.split(':')[0]:i.split(':')[1] for i in asset.split(',')}})
... print new_dict
... 
{'Features': [[{'asset': {'isCoastal': '0', 'name': 'G344E', 'WMRegion': 'SOUTHERN_REGION', 'wcuname': 'WCA3A', 'type': 'structure', 'isActive': '1'}}, {'asset': {'isCoastal': '0', 'name': 'G344F', 'WMRegion': 'SOUTHERN_REGION', 'wcuname': 'WCA3A', 'type': 'structure', 'isActive': '1'}}, {'asset': {'isCoastal': '0', 'name': 'S145', 'WMRegion': 'SOUTHERN_REGION', 'wcuname': 'WCA2B', 'type': 'structure', 'isActive': '1'}}]]}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JamesCrandall · ‎08-26-2016

I was down that road but found the implementation I posted above. I'm pretty sure it's what I need, just need to validate when I get a chance.

Thanks!

XanderBakker · ‎08-27-2016

Perhaps a one-liner would do (although readability will gone...)

dct['Features'][0] = list([eval(s) for s in set([str(d) for d in dct['Features'][0]])])‍

The idea is to use the "set" method to get a unique list, but since you can't hash dictionaries, they are converted to string. Afterwards the "eval" (which one should never use) is used to create dictionaries again.

DanPatterson_Retired · ‎08-27-2016

can that be swung into a dictionary comprehension section 5.5 in 5. Data Structures — Python 3.5.2 documentation

XanderBakker · ‎08-27-2016

The outer dictionary has only 1 element that contains a list, with 1 element (in this example) that contains the list of dictionaries with the actual data (the mentioned dictionary with one key that contains a dictionary with the properties). The evaluation is not really done at dictionary level but on the list of dictionaries. At least, I would not know how to throw this in a dictionary comprehension. I do like the fact that you can do a delete on a dictionary key value, but that would not apply here, since list elements need to be removed.

JoshuaBixby · ‎08-28-2016

Adapting from Remove duplicate dict in list in Python, the same general logic/approach of your code can be implemented through a list comprehension:

dct['Features'][0] = [d for n, d
                      in enumerate(dct['Features'][0])
                      if d not in dct['Features'][0][n+1:]]‍‍‍

Between switching to a list comprehension and using in with a slice of the original list, the adapted code runs right around twice as fast.

XanderBakker · ‎08-29-2016

That sounds a lot better! Thanks for sharing, bixb0012 !

JoshuaBixby · ‎08-29-2016

Xander Bakker‌, thanks. My initial thoughts were to use set, somehow, but that sent me down the rabbit hole of Python not having frozen dictionaries as a built-in data type. Implementing a frozen dict is quite simple, but the reasons PEP 416 -- Add a frozendict builtin type were rejected kept coming up, mainly performance. The idea of trying MappingProxyType was intriguing, but that would only work in ArcGIS Pro since it requires Python 3.3+. After running timeit on several different solutions, it became clear that the general approach James Crandall‌ settled on was going to perform the best.