Removing duplicates from a dictionary

8540
12
Jump to solution
08-26-2016 01:55 PM
JamesCrandall
MVP Frequent Contributor

Hi all!


I'm in need of assistance in removing duplicate values within a single dictionary as I'm not having any success with the ideas/examples I've found.

I need to identify and remove any duplicate "asset" within this dicitionary:

{
 "Features": [
  [
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344E",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344F",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "G344F",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA3A",
     "type": "structure",
     "isActive": 1
    }
   },
   {
    "asset": {
     "isCoastal": 0,
     "name": "S145",
     "WMRegion": "SOUTHERN_REGION",
     "wcuname": "WCA2B",
     "type": "structure",
     "isActive": 1
    }
   }
  ]
 ]
}

0 Kudos
1 Solution

Accepted Solutions
JamesCrandall
MVP Frequent Contributor

I think this does it.  Would like to get validation though!

Thanks for looking.

    unique_data = []
    for d in JSONlist2:
        data_exists = False
        for ud in unique_data:
            if ud['asset'] == d['asset']:
              data_exists = True
              break
        if not data_exists:
            unique_data.append(d)

    data2['Features'] = unique_data

View solution in original post

12 Replies
JamesCrandall
MVP Frequent Contributor

I think this does it.  Would like to get validation though!

Thanks for looking.

    unique_data = []
    for d in JSONlist2:
        data_exists = False
        for ud in unique_data:
            if ud['asset'] == d['asset']:
              data_exists = True
              break
        if not data_exists:
            unique_data.append(d)

    data2['Features'] = unique_data

DarrenWiens2
MVP Honored Contributor

That's prettier than what I came up with (basically, muscle dictionary into a set, then back to dictionary):

... my_set = set([])
... for asset in dict['Features'][0]:
...     for k1,v1 in asset.iteritems():
...         new_list = []
...         for k2,v2 in v1.iteritems():
...             new_list.append(str(k2) + ':' + str(v2)) # add key/value to a list
...         new_string = ','.join(new_list) # convert list to string
...         my_set.add(new_string) # add string to set
... new_dict = dict
... new_dict['Features'][0] = []
... for asset in my_set: # convert set back to dictionary
...     new_dict['Features'][0].append({'asset':{i.split(':')[0]:i.split(':')[1] for i in asset.split(',')}})
... print new_dict
... 
{'Features': [[{'asset': {'isCoastal': '0', 'name': 'G344E', 'WMRegion': 'SOUTHERN_REGION', 'wcuname': 'WCA3A', 'type': 'structure', 'isActive': '1'}}, {'asset': {'isCoastal': '0', 'name': 'G344F', 'WMRegion': 'SOUTHERN_REGION', 'wcuname': 'WCA3A', 'type': 'structure', 'isActive': '1'}}, {'asset': {'isCoastal': '0', 'name': 'S145', 'WMRegion': 'SOUTHERN_REGION', 'wcuname': 'WCA2B', 'type': 'structure', 'isActive': '1'}}]]}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
JamesCrandall
MVP Frequent Contributor

I was down that road but found the implementation I posted above.  I'm pretty sure it's what I need, just need to validate when I get a chance.

Thanks!

0 Kudos
XanderBakker
Esri Esteemed Contributor

Perhaps a one-liner would do (although readability will gone...)

dct['Features'][0] = list([eval(s) for s in set([str(d) for d in dct['Features'][0]])])

The idea is to use the "set" method to get a unique list, but since you can't hash dictionaries, they are converted to string. Afterwards the "eval" (which one should never use) is used to create dictionaries again.

DanPatterson_Retired
MVP Emeritus

can that be swung into a dictionary comprehension section 5.5 in 5. Data Structures — Python 3.5.2 documentation 

0 Kudos
XanderBakker
Esri Esteemed Contributor

The outer dictionary has only 1 element that contains a list, with 1 element (in this example) that contains the list of dictionaries with the actual data (the mentioned dictionary with one key that contains a dictionary with the properties). The evaluation is not really done at dictionary level but on the list of dictionaries. At least, I would not know how to throw this in a dictionary comprehension. I do like the fact that you can do a delete on a dictionary key value, but that would not apply here, since list elements need to be removed.

JoshuaBixby
MVP Esteemed Contributor

Adapting from Remove duplicate dict in list in Python, the same general logic/approach of your code can be implemented through a list comprehension:

dct['Features'][0] = [d for n, d
                      in enumerate(dct['Features'][0])
                      if d not in dct['Features'][0][n+1:]]

Between switching to a list comprehension and using in with a slice of the original list, the adapted code runs right around twice as fast.

XanderBakker
Esri Esteemed Contributor

That sounds a lot better! Thanks for sharing, bixb0012 !

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Xander Bakker‌, thanks.  My initial thoughts were to use set, somehow, but that sent me down the rabbit hole of Python not having frozen dictionaries as a built-in data type.  Implementing a frozen dict is quite simple, but the reasons PEP 416 -- Add a frozendict builtin type were rejected kept coming up, mainly performance.  The idea of trying MappingProxyType was intriguing, but that would only work in ArcGIS Pro since it requires Python 3.3+.  After running timeit on several different solutions, it became clear that the general approach James Crandall‌ settled on was going to perform the best.