Find specific key-value tuples within a dictionary?

6330
13
01-08-2015 07:31 AM
JohnLay
Occasional Contributor

I am trying to build a process that will check a dictionary for an existing key-value combination within a dictionary where the key is a variable and the value is always going to be one of three different values (16, 17, or 18). I quickly came upon the issue of searching a dictionary with a dictionary and found someone's solution to turn the search into a frozenset. That removed the TypeError: unhashable type: 'dict' error message, but the code is still not doing what I want it to do.

I am trying to identify that a table with 10's of thousands of Building ID's contains exactly 3 occurrences of each Building ID and that each occurrence is accompanied with only either a 16, 17, or 18 value in the next column. So far I've only been playing around with snippets of code to see if I could make it work.

WIND = 'L_DAMAGE_RESULTS_WIND'
readList = ["BLDG_ID", "HAZARD_ID"]
PassFail = "PASS"
WINDDict = {r[0]:(r[1:]) for r in arcpy.da.SearchCursor(WIND, readList)}
with arcpy.da.SearchCursor(WIND, "BLDG_ID") as cursor:
    for row in cursor:
        lookup = {row[0]:(18)}
        key = frozenset(lookup.items())
        if key not in WINDDict:
           PassFail = "fail"
           
print PassFail
fail
print lookup
{u'370139999': (18,)}

I thought the above would result in a "PASS" because u'370139999': (18,) does exist in WINDDict, but it didn't.

I'm still wrapping my head around dictionaries, so I would appreciate any help that is offered.

UPDATE:

So the problem appears to be with the unicode within the dictionary.

WINDDict = {29184: (u'3701310027', 17), 1: (u'370131', 16), 2: (u'3701310', 16), 3: (u'37013100', 16), 4: (u'370131000', 16), 5: (u'3701310000', 16), 6: (u'3701310001', 16), ...}

if (u'3701310027', 17)in WINDDict:
    print "yes"
else:    
    print "no"
...     
no

if 17 in WINDDict:
    print "yes"
else:    
    print "no"
...     
yes

I have no idea how to handle this.

Message was edited by: John Lay to add more explanation.

0 Kudos
13 Replies
ChrisSnyder
Regular Contributor III

I agree with Mr. Bixby, no need to store row ids. This code using set() objects also does the job:

magicNumberSet = set([16,17,18]) #the building must have all of these codes to be in the yesList
buildingDict = {}
searchRows = arcpy.da.SearchCursor('L_DAMAGE_RESULTS_WIND', ["BLDG_ID", "HAZARD_ID"])
for searchRow in searchrows:
    buildingId, hazardId = searchRow
    if buildingId in buildingDict:
        buildingDict[buildingId].add(hazardId)
    else:
        buildingDict[buildingId] = set([hazardId])
yesList = [buildingId for buildingId in buildingDict if magicNumberSet.issubset(buildingDict[buildingId])]
noList = [set(buildingDict.keys()).difference(yesList)]
JohnLay
Occasional Contributor

It's not quite that the building must have all three, is is more like there must be 3 buildings with the same ID that each have one of the three hazards. the table would look like this:

BLDG_ID     HAZARD_ID        OTHER FIELDS

37013          16                        other info unique to HAZ_ID 16

37013          17                        other info unique to HAZ_ID 17

37013          18                        other info unique to HAZ_ID 18

37014          16                        other info unique to HAZ_ID 16

...

Like I mentioned to Joshua Bixby‌ above, I will play with this some later this afternoon.

Thank you both for your suggestions.

0 Kudos
JohnLay
Occasional Contributor

OK, This almost does what I was looking for (Joshua Bixby‌ and James Crandall‌ I just haven't gotten to your examples yet. James--had some trouble with installing Panda, but am square now)

I'm a little lost with it though. Please walk me through the bits I'm missing so that I may apply the info instead of just copy it.

magicNumberSet = set([16,17,18]) 
buildingDict = {}  
searchRows = arcpy.da.SearchCursor('L_DAMAGE_RESULTS_WIND', ["BLDG_ID", "HAZARD_ID"])  
for searchRow in searchrows:  
    buildingId, hazardId = searchRow  
    if buildingId in buildingDict:  
        buildingDict[buildingId].add(hazardId)  
    else:  
        buildingDict[buildingId] = set([hazardId])  
yesList = [buildingId for buildingId in buildingDict if magicNumberSet.issubset(buildingDict[buildingId])]  
noList = [set(buildingDict.keys()).difference(yesList)]

Chris, I get a little lost around line 9. if buildingDict[buildingId].add(hazardId) is appending the value set, I assume is buildingDict[buildingId] = set([hazardId]) creating the first instance. I'm not really sure what "=" is supposed to mean here. My brain automatically goes to one is equal to the other which can't be the case.

In line 10, for each key in the dictionary if the value set is a subset of ([16,17,18]) add it to the list. This would mean that sets ([17,18]) and ([17,18,19]) would be excluded from the list, but ([16,17,18,19]) would be added. Using magicNumberSet.issuperset(buildingDict[buildingId]) would mean the reverse is true. By replacing it with magicNumberSet==set(buildingDict[buildingId]) I would essentially be saying that the sets must be equal before being added to the list. Correct? I need to check that there are always 3 buildings and only 3 buildings with a hazard ID of 16, 17, and 18. If there are only 2 buildings or 4 buildings whatever the hazard value, the table fails. Does the order the values appear in the set matter?

0 Kudos
JamesCrandall
MVP Frequent Contributor

I know there's no interest in pandas, but it really simplifies things and supercharges performance.  This along with the arcpy.da.TableToNumpyArrray method it makes it super easy to integrate.

Test data (I just created a .csv file but it could be a gdb table or just about anything else):

BLDG_ID,HAZARD_ID,OTHER FIELDS

37013,11,other info unique to HAZ_ID 11

37013,46,other info unique to HAZ_ID 46

37013,9,other info unique to HAZ_ID 9

37013,16,other info unique to HAZ_ID 16

37013,17,other info unique to HAZ_ID 17

37013,18,other info unique to HAZ_ID 18

37014,8,other info unique to HAZ_ID 8

37014,6,other info unique to HAZ_ID 6

37014,33,other info unique to HAZ_ID 33

37014,16,other info unique to HAZ_ID 16

37014,17,other info unique to HAZ_ID 17

37014,18,other info unique to HAZ_ID 18

This does exactly what you want OP:

dat = r"H:\pandas_testdat.csv"
df = pd.read_csv(dat)
df2 = df[df['HAZARD_ID'].isin([16,17,18])]  
print df2.values

Yeah.  That's all that is necessary

Result:

BLDG_ID,HAZARD_ID,OTHER FIELDS

37013L 16L 'other info unique to HAZ_ID 16'

37013L 17L 'other info unique to HAZ_ID 17'

37013L 18L 'other info unique to HAZ_ID 18'

37014L 16L 'other info unique to HAZ_ID 16'

37014L 17L 'other info unique to HAZ_ID 17'

37014L 18L 'other info unique to HAZ_ID 18'

0 Kudos