AnsweredAssumed Answered

Memory issues with a large dictionary

Question asked by GSCUser85 Champion on Mar 16, 2016
Latest reply on Oct 24, 2018 by bixb0012

I thought of asking this question over at SE, but those pro pythonistas will probably give me an answer I won't understand .

 

I am trying to load (via arcpy.da.SearchCursor) a bunch of tables. They are all related to each other via a series of attribute ids. The project is actually about a telcomms problem, tracing a fibre route from end to end and all the bits of kit that go along the route.

Unfortunately, there are rather a lot of records to load.

My loading code below:

tblInfoDict = {
           "route" : ["routeid", "name"],
           "routedetail" : ["routedetailid", "routeid", "x_table", "x_id", "num"],
           "port" : ["portid", "x_table", "x_id", "num", "grp"],
           "fibermngr" : ["fibermngrid", "name", "x_table", "x_id", "fibermngrtypeid"],
           "building" : ["buildingid", "name", "gpslatitude", "gpslongitude"],
           "strand" : ["strandid", "x_table", "x_id", "num", "bundle", "color"],
           "span" : ["spanid", "spantypeid", "length", "locateid"],
           "cable" : ["cableid", "spanid", "spantypeid"],
           "enclosure" : ["enclosureid", "name", "x_table", "x_id", "enclosuretypeid"],
           "access_point" : ["access_pointid", "name", "typ", "gpslatitude", "gpslongitude"],
           "ductbank" : ["ductbankid", "name"],
           "spantype" : ["spantypeid", "name"],
           "innerduct" : ["innerductid", "ductbankid", "superductid"],
           "superduct" : ["superductid", "ductbankid"]
           }
# load a series of dictionaries
data_dict = {}
for tbl, flds in tblInfoDict.iteritems():
    print "Reading {}".format(tbl)
    t1 = time.time()
    data_dict[tbl] = {}
    temp_dict = {r[0] : r[1:] for r in arcpy.da.SearchCursor(tbl, flds)}
    print "Size {}".format(sys.getsizeof(temp_dict))
    data_dict[tbl].update(temp_dict)
    del temp_dict
    print "Total Size {}".format(sys.getsizeof(data_dict))
    t2 = time.time()
    print "Read took {:.2f} secs".format(t2 - t1)

 

I inserted the sys.getsizeof(object) to try and get a handle on what I was consuming.

 

The run window:

>>> 
Reading ductbank
Size 1573004
Total Size 140
Read took 1.33 secs
Reading access_point
Size 1573004
Total Size 140
Read took 0.62 secs
Reading spantype
Size 524
Total Size 140
Read took 0.10 secs
Reading superduct
Size 3145868
Total Size 140
Read took 1.24 secs
Reading enclosure
Size 393356
Total Size 140
Read took 0.27 secs
Reading strand
Size 50331788
Total Size 524
Read took 22.63 secs
Reading routedetail
Size 25165964
Total Size 524
Read took 7.39 secs
Reading building
Size 393356
Total Size 524
Read took 0.25 secs
Reading fibermngr
Size 393356
Total Size 524
Read took 0.22 secs
Reading span
Size 1573004
Total Size 524
Read took 0.43 secs
Reading cable
Size 1573004
Total Size 524
Read took 0.59 secs
Reading route
Size 1573004
Total Size 524
Read took 0.60 secs
Reading port
Size 25165964
Traceback (most recent call last):
  File "C:\Data\ESRI-SA\DarkFibreAfrica\Vodacom\Python\ProcessDBTables.py", line 48, in <module>
    data_dict[tbl].update(temp_dict)
MemoryError
>>> sys.version
'2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]'
>>> 

 

So the output of sys.getsizeof(), when pointing at my intermediate temp_dict reports something real.

But, the output for data_dict which is accumulating the data make no sense whatsoever.

If anyone has insight into this, please explain.

 

So, to get to my question.

Could I run this script using the 64 bit version? And how?

Would I be able to get the entire large dict into memory using the 64 bit python?

 

Running this in v10.3.1 python 2.7.8

Outcomes