Hi,
I have been trying for the past couple of days to create a unique ID for all shared IDs. I have something close to it, but I keep running into issues in regards to either getting all of the shared values or end up with a unique ID for a single value.
Each unique ID should have at least two values. I am not sure if I structured my script accordingly or if there is something that I missed. In any case, I would greatly appreciate if anyone could take a look and let me know if there is a more efficient what to script this.
## 1 [2, 23]
## 2 [1, 3]
## 3 [2, 37]
## 15 [18]
## 18 [15, 19]
## 19 [18, 22]
## 22 [19, 124]
## 23 [1, 115247]
## 37 [3, 115241]
## 43 [106]
## 46 [48, 49]
## 47 [48]
etc...
import random
sample_dict = {1: [2, 23], 2: [1, 3], 3: [2, 37], 15: [18], 18: [15, 19], 19: [18, 22], 22: [19, 124], 23: [1, 115247], 37: [3, 115241], 43: [106], 46: [48, 49], 47: [48], 48: [46, 47, 49], 49: [46, 48, 4918], 51: [52, 115685], 52: [51, 115691], 56: [57], 57: [56, 60], 60: [57, 64], 64: [60], 66: [68, 88530], 68: [66, 73], 71: [72, 75], 72: [71, 73], 73: [68, 72], 75: [71, 76], 76: [75], 79: [80, 102], 80: [79, 82], 82: [80, 83], 83: [82, 84], 84: [83], 89: [98, 115270], 97: [99], 98: [89, 99], 99: [97, 98], 100: [115660, 115662], 101: [98285], 102: [79, 103], 103: [102], 104: [105], 105: [104], 106: [43], 108: [116], 111: [100838], 112: [113], 113: [112, 114], 114: [113, 121], 115: [121, 100842], 116: [108, 117], 117: [116, 100829], 120: [100841, 115265], 121: [114, 115], 124: [22, 125], 125: [124, 131], 131: [125], 135: [136, 286], 136: [135, 137], 137: [136, 138], 138: [137, 188], 142: [143, 188], 143: [142, 144], 144: [143, 115200], 148: [149, 108526, 115191], 149: [148, 110917], 150: [110917, 115182], 153: [189, 115209], 158: [110901, 115214], 161: [115212], 162: [163, 115224], 163: [162, 164], 164: [163, 115228], 167: [170, 115229], 170: [167, 171], 171: [170, 177], 174: [177, 115238], 177: [171, 174], 178: [187], 179: [186, 187], 186: [179, 115248], 187: [178, 179], 188: [138, 142], 189: [153, 115185], 190: [218], 191: [223, 350], 194: [195], 195: [194, 196, 199], 196: [195, 199, 111145], 199: [195, 196], 206: [209, 374], 209: [206, 210], 210: [209, 213], 213: [210, 215], 214: [215, 111145], 215: [213, 214], 218: [190, 226], 219: [222, 226], 222: [219, 115254], 223: [191, 115253], 226: [218, 219], 227: [272, 96846], 240: [241, 96852], 241: [240, 247], 247: [241, 115167], 256: [257], 257: [256, 762], 258: [259, 92884], 259: [258, 260], 260: [259, 263], 262: [96828], 263: [260], 264: [285, 115178], 272: [227, 288], 273: [275, 288], 275: [273, 278], 278: [275, 115172], 279: [282, 115173], 282: [279, 81472], 283: [289, 81471], 285: [264, 286], 286: [135, 285], 288: [272, 273], 289: [283, 115174], 290: [291, 115133], 291: [290, 311], 293: [296], 296: [293, 299], 299: [296, 115151], 303: [304], 304: [303], 311: [291, 115131], 313: [115147], 318: [324, 328], 321: [323, 331], 322: [332], 323: [321, 324], 324: [318, 323], 325: [326], 326: [325, 327], 327: [326, 328], 328: [318, 327], 331: [321, 333], 332: [322, 333], 333: [331, 332], 338: [341, 353], 341: [338, 342], 342: [341, 343], 343: [342, 346], 344: [345, 352], 345: [344, 346], 346: [343, 345], 349: [350], 350: [191, 349], 351: [352, 115142], 352: [344, 351], 353: [338, 115146], 354: [355], 355: [354, 92883], 361: [392, 115646], 362: [115637], 365: [366, 380], 366: [365, 383], 367: [383, 104125], 368: [370, 100846], 370: [368, 401], 373: [400, 115257], 374: [206], 376: [115638, 115640], 380: [365], 383: [366, 367], 391: [114410, 115578], 392: [361, 115649], 396: [115652], 400: [373, 403], 401: [370], 402: [104126, 114395], 403: [400, 104041], 406: [407, 6425], 407: [406, 6425, 82190], 409: [485, 98269], 410: [115276, 115278], 414: [415], 415: [414, 115678], 416: [417], 417: [416, 115274], 422: [423], 423: [422, 478], 424: [477], 431: [434], 434: [431, 450], 435: [13870, 115670], 440: [443], 443: [440], 446: [447, 481], 447: [446, 483], 450: [434, 451], 451: [450, 94393], 453: [486], 454: [484], 457: [458, 476], 458: [457, 461], 461: [458], 465: [468, 93441], 468: [465, 100879], 469: [93445, 115277], 474: [115281], 475: [476, 115282], 476: [457, 475], 477: [424, 478], 478: [423, 477], 481: [446, 100871], 483: [447], 484: [454, 485], 485: [409, 484], 486: [453, 115671], 489: [490], 490: [489, 491], 491: [490, 115290], 498: [115292, 115295], 499: [500, 115296], 500: [499, 503], 503: [500, 504], 504: [503, 507], 507: [504, 508], 508: [507, 511], 511: [508, 115298], 512: [115298, 115299], 517: [524, 115285], 520: [526], 524: [517], 526: [520, 529], 529: [526, 530], 530: [529, 531], 531: [530, 532], 532: [531], 539: [545], 542: [543, 549], 543: [542], 545: [539], 548: [76885, 93433], 549: [542, 552], 552: [549, 553], 553: [552], 554: [580, 598], 556: [595, 615], 564: [624, 115326], 567: [7675], 568: [609, 90917], 571: [574, 577, 114453], 574: [571], 577: [571, 578, 114453], 578: [577, 600], 579: [115312, 115313], 580: [554, 115315], 581: [113797, 115312], 584: [585, 587], 585: [584], 587: [584], 590: [115322], 591: [615], 595: [556, 648], 598: [554, 601], 600: [578], 601: [598, 115320], 609: [568], 613: [614, 664], 614: [613, 115334], 615: [556, 591], 617: [7560, 91525], 620: [634], 623: [699, 115511], 624: [564], 633: [634, 635], 634: [620, 633], 635: [633], 636: [637, 115512], 637: [636, 115517], 646: [647], 647: [646, 115516], 648: [595, 653], 651: [652], 652: [651, 653], 653: [648, 652], 656: [657, 719], 657: [656, 722], 660: [678], 663: [664, 677], 664: [613, 663], 665: [666, 682], 666: [665, 667], 667: [666, 668], 668: [667, 680], 669: [670, 679, 92570], 670: [669, 671, 92570], 671: [670, 672], 672: [671, 673], 673: [672, 674], 674: [673, 676], 675: [676, 687], 676: [674, 675], 677: [663, 678], 678: [660, 677], 679: [669, 680], 680: [668, 679], 681: [682, 93009], 682: [665, 681], 683: [93010], 684: [115337], 685: [115337], 686: [687, 1718], 687: [675, 686], 688: [700], 693: [698, 724], 698: [693], 699: [623, 700], 700: [688, 699], 705: [708], 708: [705, 711], 711: [708, 712], 712: [711, 715], 715: [712, 718], 718: [715, 719], 719: [656, 718], 722: [657, 723], 723: [722, 725], 724: [693, 725], 725: [723, 724], 728: [1793, 115497], 731: [734, 115501], 734: [731, 735], 735: [734, 738], 738: [735, 741], 741: [738, 115506], 743: [1315, 114885], 750: [84687], 762: [257, 765], 765: [762], 768: [769], 769: [768, 115164], 776: [780], 780: [776, 115163], 782: [783, 115165], 783: [782], 798: [799, 850], 799: [798, 845], 800: [806, 115022], 803: [804, 37270, 37279], 804: [803, 805, 37279], 805: [804, 115096], 806: [800, 115018], 841: [854, 115081], 844: [115075, 115079], 845: [799, 115071], 850: [798, 115070], 851: [866, 115078], 854: [841, 865], 855: [78696, 115089], 856: [115066, 115069], 859: [115067], 865: [854, 866], 866: [851, 865], 872: [874, 877], 873: [874, 895], 874: [872, 873], 875: [876], 876: [875, 877], 877: [872, 876], 880: [881, 1038], 881: [880, 884], 884: [881, 80306], 889: [891, 892], 890: [891, 918], 891: [889, 890, 892], 892: [889, 891, 922], 895: [873, 896], 896: [895, 899], 899: [896, 900], 900: [899], 905: [906, 103228], 906: [905, 927, 103228], 909: [925], 917: [926], 918: [890], 922: [892], 925: [909, 941], 926: [917], 927: [906], 936: [939, 104317], 939: [936, 940], 940: [939, 941], 941: [925, 940], 946: [953, 1045], 952: [953], 953: [946, 952, 1045], 957: [958, 115034], 958: [957], 961: [995, 1044], 966: [972, 103248], 969: [972], 972: [966, 969], 976: [977], 977: [976], 981: [1522], 985: [986, 1043], 986: [985, 987], 987: [986, 993], 993: [987, 994], 994: [993], 995: [961], 1001: [1002], 1002: [1001, 1005], 1005: [1002, 1006], 1006: [1005, 115038], 1007: [1013, 115038], 1013: [1007, 1014], 1014: [1013], 1019: [1023], 1023: [1019], 1025: [1028, 115042], 1028: [1025, 1031], 1031: [1028, 1035], 1035: [1031, 1036], 1036: [1035], 1037: [115047], 1038: [880, 115046], 1043: [985], 1044: [961, 8597, 103247], 1045: [946, 953, 1046], 1046: [1045], 1047: [1076, 107909], 1048: [1227, 1653], 1053: [1055], 1055: [1053], 1061: [88214], 1066: [79859], 1067: [1069, 1145], 1069: [1067, 1072], 1072: [1069, 1073], 1073: [1072, 1076], 1076: [1047, 1073], 1079: [1144], 1084: [1085, 1119], 1085: [1084], 1092: [1146], 1100: [1101], 1101: [1100, 1102], 1102: [1101, 1105], 1105: [1102], 1108: [1109], 1109: [1108, 1112], 1112: [1109, 1113], 1113: [1112, 1114], 1114: [1113], 1117: [1118], 1118: [1117, 1121], 1119: [1084, 1120], 1120: [1119, 1121], 1121: [1118, 1120], 1125: [1145], 1130: [1131, 1548], 1131: [1130, 1136], 1136: [1131, 1137], 1137: [1136, 1143], 1142: [90684, 115404], 1143: [1137, 1144], 1144: [1079, 1143], 1145: [1067, 1125], 1146: [1092], 1159: [1172, 97689], 1164: [97690], 1172: [1159], 1173: [115524], 1176: [114845], 1177: [1191, 114870], 1185: [1186], 1186: [1185, 1187], 1187: [1186, 96752], 1189: [1190, 1600], 1190: [1189], 1191: [1177, 1192], 1192: [1191, 1193], 1193: [1192, 1196], 1196: [1193, 1197], 1197: [1196, 114867], 1202: [114866], 1207: [1208], 1208: [1207, 1209], 1209: [1208, 114863], 1212: [114856, 114859], 1220: [1627, 1673], 1222: [1699], 1224: [1225, 1226], 1225: [1224], 1226: [1224, 1631], 1227: [1048, 1604], 1228: [1229, 1230], 1229: [1228, 114944], 1230: [1228, 1388, 114907], 1231: [1291], 1232: [1233], 1233: [1232, 1234], 1234: [1233, 1285], 1235: [1296, 1316], 1241: [1242, 114892], 1242: [1241, 1245], 1245: [1242, 1248], 1248: [1245, 1249], 1249: [1248, 1252], 1252: [1249, 1253], 1253: [1252, 1259], 1259: [1253, 1262], 1262: [1259], 1267: [1268], 1268: [1267, 1269], 1269: [1268, 1272], 1272: [1269, 1275], 1275: [1272, 1280], 1280: [1275], 1285: [1234, 1286], 1286: [1285, 114888], 1291: [1231, 1292], 1292: [1291, 1295], 1295: [1292, 1297], 1296: [1235, 1297], 1297: [1295, 1296], 1306: [1311, 1317], 1311: [1306, 1312], 1312: [1311, 1315], 1315: [743, 1312], 1316: [1235, 114887], 1317: [1306, 1318], 1318: [1317], 1319: [114954, 114959], 1320: [114934, 114936], 1321: [114933], 1325: [1328], 1328: [1325, 1329], 1329: [1328, 114931], 1332: [114928, 114930], 1335: [1337, 1339], 1337: [1335, 1338], 1338: [1337], 1339: [1335, 1340], 1340: [1339, 1343], 1343: [1340, 114940], 1346: [114941], 1349: [114939], 1353: [1356], 1356: [1353, 114936], 1361: [86809], 1362: [1363, 86810], 1363: [1362, 1364], 1364: [1363, 1365], 1365: [1364, 1366], 1366: [1365, 1367], 1367: [1366], 1371: [1372, 88556], 1372: [1371, 1373], 1373: [1372, 1376], 1374: [1375, 114928], 1375: [1374, 113162], 1376: [1373, 1377], 1377: [1376, 84160], 1378: [115739], 1379: [1383, 115738], 1380: [1381, 90369], 1381: [1380, 1382], 1382: [1381], 1383: [1379], 1385: [1388, 107390, 114911], 1388: [1230, 1385, 107390, 114907], 1391: [1401], 1394: [114906, 114923], 1401: [1391, 1404], 1404: [1401, 1407], 1407: [1404, 1408], 1408: [1407, 114814], 1409: [1410, 114821], 1410: [1409, 1413], 1413: [1410], 1419: [1420], 1420: [1419, 1421], 1421: [1420], 1425: [1533], 1426: [1525, 1551], 1427: [94100], 1432: [1434], 1434: [1432], 1440: [1525, 1526], 1443: [1444], 1444: [1443, 114882], 1452: [114881], 1455: [1456], 1456: [1455, 1457], 1457: [1456, 1460], 1460: [1457, 1531], 1463: [1464, 1531], 1464: [1463, 1465], 1465: [1464, 1468], 1468: [1465, 1472], 1472: [1468, 1473], 1473: [1472, 1476], 1476: [1473], 1483: [1484, 1487], 1484: [1483], 1487: [1483, 1490], 1490: [1487, 1491], 1491: [1490, 1492], 1492: [1491, 1495], 1495: [1492, 1496], 1496: [1495, 1499], 1499: [1496, 1500], 1500: [1499, 1529], 1505: [1528], 1510: [1515], 1515: [1510, 1518], 1518: [1515, 1519], 1519: [1518], 1522: [981], 1525: [1426, 1440], 1526: [1440, 47030], 1528: [1505, 1530], 1529: [1500, 1530], 1530: [1528, 1529], 1531: [1460, 1463], 1532: [86518], 1533: [1425, 1534], 1534: [1533, 1537], 1537: [1534, 1549], 1544: [1547], 1547: [1544, 1548], 1548: [1130, 1547], 1549: [1537], 1551: [1426, 1552], 1552: [1551, 1553], 1553: [1552, 1561], 1557: [1559, 1560], 1559: [1557, 80702], 1560: [1557, 98188], 1561: [1553, 114880], 1566: [1569, 114879], 1569: [1566, 102772], 1575: [102749, 114873], 1580: [1581], 1581: [1580, 1586], 1586: [1581, 114877]}
UpdateIsolatedIDs = {}
UIDs = {}
for A, B in sample_dict.items():
IsolationID = random.randint(10000000, 99999999)
#print (A, B)
try:
if A not in UpdateIsolatedIDs:
if len(B) > 1:
for b in B:
if b in sample_dict and b not in UpdateIsolatedIDs:
sample = sample_dict[b]
if A in sample:
if len(sample) > 1:
UpdateIsolatedIDs[A] = IsolationID
for s in sample:
if s != A:
UpdateIsolatedIDs[s] = UpdateIsolatedIDs[A]
UpdateIsolatedIDs[b] = UpdateIsolatedIDs[A]
except BLength:
if len(B) == 1:
sample = sample_dict[B[0]]
if len(sample) > 1:
if A in sample:
UpdateIsolatedIDs[A] = IsolationID
for s in sample:
if s != A and s not in UpdateIsolatedIDs:
UpdateIsolatedIDs[s] = UpdateIsolatedIDs[A]
UpdateIsolatedIDs[B[0]] = UpdateIsolatedIDs[A]
except b_sample_dict:
if b not in sample_dict and b not in UpdateIsolatedIDs:
if A in UpdateIsolatedIDs:
UpdateIsolatedIDs[b] = UpdateIsolatedIDs[A]
except sample_len:
if len(sample) == 1:
if A == sample:
if A not in UpdateIsolatedIDs:
UpdateIsolatedIDs[A] = IsolationID
except A_UpdateIsolatedIDs:
if A in UpdateIsolatedIDs:
for b in B:
if b not in UpdateIsolatedIDs:
UpdateIsolatedIDs[b] = UpdateIsolatedIDs[A]
for A, B in UpdateIsolatedIDs.items():
IsoID = []
for C, D in UpdateIsolatedIDs.items():
if B == D:
IsoID.append(C)
UIDs[B] = IsoID
for A, B in UIDs.items():
print (A, B)
Solved! Go to Solution.
So you have some sort of chainage going on, where entities are connected to each other, but they don't form one big network but multiple small ones. You want to assign IDs to these small networks.
Does this come close to what you want?
# output dict
groups = dict()
group_id = 100000000
# keys that already are in a group
visited_keys = set()
for key in sample_dict:
if key not in visited_keys: # to prevent processing the same group multiple times
# create the group this key belongs to
group = set()
# use a heap/stack to process all keys belonging to that group
heap = {key}
while heap: # while there are keys to be processed
k = heap.pop() # take the last key from the heap
group.add(k) # add it to the group
visited_keys.add(k) # mark it as visited
# add all the keys connected to k to the heap (empty list if there are no other keys)
heap.update(sample_dict.get(k, []))
# add all the keys that k is connected to to the heap
heap.update([a for a, b in sample_dict.items() if k in b])
# remove all visited keys from the heap to prevent infinite loop
heap -= visited_keys
# assign ID to the group and add it to output dict
# PLEASE DON'T ASSIGN YOUR ID WITH random.randint!
# it's ugly, it's not guaranteed to be unique, you will hate yourself
# when you have to add a group manually.
# Either use a Universally Unique ID [python: uuid.uuid4()]
# or increment a counter like I do here
groups[group_id] = group
group_id += 1
for group_id, group in groups.items():
print(group_id, group)
print(list(UIDs.values())[0])
#[1, 3, 2, 115247, 23]
print(list(groups.values())[0])
#{1, 2, 3, 37, 115241, 115247, 23}
I think you can trim the commented pairs down to a few since you have them in the dictionary and helps keep the post shorter to scroll through.
Not really sure what you are attempting- stepping through it with the debugger I get something like this for the first item in the dictionary.
{1: 10495634, 3: 10495634, 2: 10495634, 115247: 10495634, 23: 10495634}
Since the next key '2' is in this dictionary, it skips it, and again for key '3'. Once that is all done, what is your intention for except BLength, b_sample_dict, etc? I take it you are throwing custom exceptions set up for these that wasn't included.
Is the unique id you are trying to create meant to be the key or value here?
Since I picked a sample size that was from a much larger size, I set up the script to make an exception where if the value is not the key in the dictionary, then automatically locate it and determine the unique ID. The other exception is meant to identify whether the length of values is greater than one, and if those values as keys also have values with a length greater than one.
For instance:
If you look at the key for 1, there are two other IDs that are shared. If you look at the values in the list, there is a corresponding key. That key also has values that exist as keys.
Basically, if the keys and their values are in common, then assign it a unique ID for all of those values.
So you have some sort of chainage going on, where entities are connected to each other, but they don't form one big network but multiple small ones. You want to assign IDs to these small networks.
Does this come close to what you want?
# output dict
groups = dict()
group_id = 100000000
# keys that already are in a group
visited_keys = set()
for key in sample_dict:
if key not in visited_keys: # to prevent processing the same group multiple times
# create the group this key belongs to
group = set()
# use a heap/stack to process all keys belonging to that group
heap = {key}
while heap: # while there are keys to be processed
k = heap.pop() # take the last key from the heap
group.add(k) # add it to the group
visited_keys.add(k) # mark it as visited
# add all the keys connected to k to the heap (empty list if there are no other keys)
heap.update(sample_dict.get(k, []))
# add all the keys that k is connected to to the heap
heap.update([a for a, b in sample_dict.items() if k in b])
# remove all visited keys from the heap to prevent infinite loop
heap -= visited_keys
# assign ID to the group and add it to output dict
# PLEASE DON'T ASSIGN YOUR ID WITH random.randint!
# it's ugly, it's not guaranteed to be unique, you will hate yourself
# when you have to add a group manually.
# Either use a Universally Unique ID [python: uuid.uuid4()]
# or increment a counter like I do here
groups[group_id] = group
group_id += 1
for group_id, group in groups.items():
print(group_id, group)
print(list(UIDs.values())[0])
#[1, 3, 2, 115247, 23]
print(list(groups.values())[0])
#{1, 2, 3, 37, 115241, 115247, 23}
Hi @JohannesLindner,
This is exactly what I was looking for. I could not figure out a more efficient way of doing this since the sample I posted is magnitudes smaller than the actual dictionary. I have been trying to figure out the best way of doing this but I just couldn't, despite trying every thinkable option.
Also, I didn't think of using a counter for some reason. I simply thought of just using some random integer. I figured picking a random number from a large range of numbers would make it near impossible for a duplicate to occur. That was my reasoning for that even though a counter, as you suggested, would have been the better option.