Find specific key-value tuples within a dictionary?

JohnLay · ‎01-08-2015

I am trying to build a process that will check a dictionary for an existing key-value combination within a dictionary where the key is a variable and the value is always going to be one of three different values (16, 17, or 18). I quickly came upon the issue of searching a dictionary with a dictionary and found someone's solution to turn the search into a frozenset. That removed the TypeError: unhashable type: 'dict' error message, but the code is still not doing what I want it to do.

I am trying to identify that a table with 10's of thousands of Building ID's contains exactly 3 occurrences of each Building ID and that each occurrence is accompanied with only either a 16, 17, or 18 value in the next column. So far I've only been playing around with snippets of code to see if I could make it work.

WIND = 'L_DAMAGE_RESULTS_WIND'
readList = ["BLDG_ID", "HAZARD_ID"]
PassFail = "PASS"
WINDDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(WIND, readList)}
with arcpy.da.SearchCursor(WIND, "BLDG_ID") as cursor:
    for row in cursor:
        lookup = {row[0]:(18)}
        key = frozenset(lookup.items())
        if key not in WINDDict:
           PassFail = "fail"
           
print PassFail
fail
print lookup
{u'370139999': (18,)}

I thought the above would result in a "PASS" because u'370139999': (18,) does exist in WINDDict, but it didn't.

I'm still wrapping my head around dictionaries, so I would appreciate any help that is offered.

UPDATE:

So the problem appears to be with the unicode within the dictionary.

WINDDict = {29184: (u'3701310027', 17), 1: (u'370131', 16), 2: (u'3701310', 16), 3: (u'37013100', 16), 4: (u'370131000', 16), 5: (u'3701310000', 16), 6: (u'3701310001', 16), ...}

if (u'3701310027', 17)in WINDDict:
    print "yes"
else:    
    print "no"
...     
no

if 17 in WINDDict:
    print "yes"
else:    
    print "no"
...     
yes

I have no idea how to handle this.

Message was edited by: John Lay to add more explanation.

JoshuaBixby · ‎01-08-2015

If the data is already in a table, it might be more straightforward to run Summary Statistics. Generating a count, minimum, and maximum of NextColumn against BuildingID should get you what you are after. The count will let you know if there are only three records, and the minimum and maximum will show whether any of the values are less than 16 or more than 18.

Something like:

arcpy.Statistics_analysis(in_table,
                          out_table,
                          [ ["NextCol", "COUNT"], ["NextCol", "MIN"], ["NextCol", "MAX"] ],
                          ["BuildIDCol"])

JohnLay · ‎01-08-2015

Yes, but that just makes it too logical and simple .

JamesCrandall · ‎01-08-2015

numpy and pandas are really good at selecting and grouping things. Something like this could work (needs to be tested though)

import numpy
from pandas import *

#create numpy array from the table
nparr = arcpy.da.TableToNumPyArray(WIND, ["BLDG_ID", "HAZARD_ID"])

#turn it into a Pandas data frame
df = DataFrame(nparr, columns=["BLDG_ID", "HAZARD_ID"])

#new data frame with the selection criteria
df2 = df[df['HAZARD_ID'].isin([16,17,18])]

#do something meaningful with the filtered dataframe like convert it to a .csv file
df2.to_csv(r'C:\output.csv')

JohnLay · ‎01-08-2015

Thanks James, I haven't had the opportunity to play with numpy yet. I will definitely give this a look. But for the sake of expediency, I'm going to have to go with the Joshua Bixby's solution.

JamesCrandall · ‎01-08-2015

No problem. I was just working thru grouping issue this morning too and had some success with those libraries. All of that For Loop stuff just can't touch the performance of numpy/pandas --- processing 4.5 million rows in a couple of minutes is no problem.

ChrisSnyder · ‎01-08-2015

This is very easy to do using the .items() dictionary method. Consider:

dict = {1:16, 2:17, 3:19, ...}

key = 2 #this is your variable

valueList = (16,17,18) # vals you are looking for in the keys. There's only one value per key right?

dictItems = dict.items()

for value in valueList:

if (key, value) in dictItems:

print str((key, value)) + " is in the dictionary!"

JohnLay · ‎01-08-2015

Actually, the dictionary would look more like this:

dict = {1:(u'37013',16), 2: (u'37013', 17), 3: (u'37013', 18), 4: (u'37014', 16), 5: (u'37014', 17), 6: (u'37014', 18),...} # corresponding with Building ID: Hazard ID.

I need to be able to identify that Building 37013 has a row for Hazard ID 16, a row for Hazard 17, and one for Hazard 18. If the table does not meet this criteria, the table fails the check and I need to log the Building ID.

JoshuaBixby · ‎01-08-2015

Are your keys currently row numbers from a table? If so, you are basically treating a dictionary like a list, which is more awkward to work with than if it was just a list. The focus of the dictionary key should be the BuildingID, not the row number.

Assuming you want each building to have the three hazards mentioned above, and only those three hazards, the following (untested) code might work:

# import functions from modules that are available but not commonly imported  
from collections import defaultdict  
from numpy import fromiter, dtype

# group hazards by building
haz_group = defaultdict(list)
with arcpy.da.SearchCursor(in_table, ['BuildingID', 'HazardID']) as cur:
    for k, v in cur:
        haz_group.append(v)

# create iterable and populate with flagged buildings
build_iter = (k for (k, v) in haz_group.iteritems() if v.sort() != [16, 17, 18])
tmp1 = fromiter(build_iter, dtype([('BuildingID', 'S10')]))

# dump numpy array to table
arcpy.da.NumPyArrayToTable(tmp1, out_table)

# or just print BuildingIDs to console
for build in build_iter:
    print build

Note that if you are going to just print the buildings, drop lines 13 and 16 because line 13 will exhaust the generator so lines 19 and 20 won't print anything. Instead of creating a generator expression, you could use a list comprehension and the list would persist. I tend to work with generator expressions because lists can consume large amounts of memory with very large data sets.

NOTE: Updated Line 12 to address the HazardID not necessarily being sorted in original table.

JohnLay · ‎01-09-2015

I tried building the dictionary with just the Building ID, but the result was a single Building ID key and a single Hazard ID. I probably built it wrong. But like I said originally, I'm still wrapping my head around dictionaries.

e.g.:

dict = {1:(u'37013',16), 2: (u'37013', 17), 3: (u'37013', 18), 4: (u'37014', 16), 5: (u'37014', 17), 6: (u'37014', 18),...}

became

dict = {u'37013':18, u'37014':18,...}

so even if the original code worked it would fail b/c there was no Hazard ID 16 or 17.

I will give this a look-see a little later this afternoon. I will also give @Chris Snyder' s a look as well.