Is in_memory Worth Using?

GeoffOlson · ‎10-23-2013

I have a script that uses selection sets and cursors and it populates the field accordingly, but because there are so many records (about 130,000) I was wondering if it could be sped up by using in_memory, which I just learned about. I understand that in_memory is good until it's all used up and then it's super slow. I also understand that writing data to RAM is also only temporary and has to be retrieved from RAM back to the HDD. Are there certain instances where it's useful like a new feature layer or table is being created in RAM instead of something like an update cursor? I'll post my code below.

import arcpy, os, time
from datetime import datetime
from datetime import timedelta

#set map doc and the layer to be used
mxd = arcpy.mapping.MapDocument("Current")
mapLyr1 = arcpy.mapping.ListLayers(mxd, "NEW_BiState_Grid400_IowaSP") [0]
mapLyr2 = arcpy.mapping.ListLayers(mxd, "NEW_BiState_Grid100_IowaSP") [0]

#alpha will be assigned a letter to rows2 update, there are 16
place = 0
alpha = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P']
searchrow = 0
ck1 = searchrow
ck2 = ck1 + 1
ck3 = ck2 + 1
ck4 = ck3 + 1
ck5 = ck4 + 1
cl1 = timedelta()
cl2 = timedelta()
cl3 = timedelta()
cl4 = timedelta()
cl5 = timedelta()
rows1 = arcpy.SearchCursor(mapLyr1, "", "", "Name")
rowcount = int(arcpy.GetCount_management(mapLyr1).getOutput(0))
allrows = (rowcount + 0.0)
for row in rows1:
    clock1 = datetime.now()
    bigtile = str()
    arcpy.SelectLayerByAttribute_management(mapLyr1, "NEW_SELECTION", '"FID" = %s' %searchrow)
    bigtile = row.getValue("Name")
    print bigtile
    searchrow = searchrow + 1
    prgrow = (searchrow + 0.0)
    arcpy.SelectLayerByLocation_management(mapLyr2, "HAVE_THEIR_CENTER_IN", mapLyr1, 0, "ADD_TO_SELECTION")
    rows2 = arcpy.UpdateCursor(mapLyr2, "", "", "", "FID")
    for row2 in rows2:
        row2.tile = bigtile + alpha[place]
        rows2.updateRow(row2)
        place = place + 1
        if place == 16:
            place = 0
    arcpy.SelectLayerByAttribute_management(mapLyr1, "CLEAR_SELECTION")
    arcpy.SelectLayerByAttribute_management(mapLyr2, "CLEAR_SELECTION")
    prgrss = ((prgrow / allrows)*100.0)
    rowsleft = rowcount - searchrow
    clock2 = datetime.now()
    clock3 = (clock2 - clock1)
    if ck1 == searchrow:
        cl1 = clock3
        ck1 = ck1 + 5
    elif searchrow == ck2:
        cl2 = clock3
        ck2 = ck3 + 5
    elif searchrow == ck3:
        cl3 = clock3
        ck3 = ck3 + 5
    elif searchrow == ck4:
        cl4 = clock3
        ck4 = ck4 + 5
    elif searchrow == ck5:
        cl5 = clock3
        ck5 = ck5 + 5
    
    if searchrow < 5:
        pass
    elif searchrow > 4:
        clock4 = ((cl1 + cl2 + cl3 + cl4 + cl5)/5)
        clock5 = (clock4 * rowsleft)
        clock6 = str(clock5)
        arcpy.AddMessage("The last 5 iterations averaged %s" %clock4)
        arcpy.AddMessage("%s estimated time remaining" %clock6)
    arcpy.AddMessage("%d%% completed - row %d out of %d rows" %(prgrss, searchrow, rowcount))
    arcpy.AddMessage("______________________________")
del mxd, row, rows1, row2, rows2, searchrow, place, bigtile, rowcount, prgrow, allrows, prgrss, rowsleft, clock1, clock2, clock3

MathewCoyle · ‎10-23-2013

I have gotten some massive performance gains using in_memory over HDD workspaces, especially if you don't use RAID/SSD drives. I did some benchmarking a while back but can't find the results. Depending on the task it could be twice as fast to 20 times faster. Cursors on in_memory tables were the most gains IIRC.

And yes, the caveat, you also have to keep on eye on the size of data you are writing to memory. I've crashed machines not being diligent about what is in memory and clearing unneeded data. 64-bit geoprocessing won't help you when trying to commit 64GB of data to memory.

JamesCrandall · ‎10-23-2013

I have gotten some massive performance gains using in_memory over HDD workspaces, especially if you don't use RAID/SSD drives. I did some benchmarking a while back but can't find the results. Depending on the task it could be twice as fast to 20 times faster. Cursors on in_memory tables were the most gains IIRC.

And yes, the caveat, you also have to keep on eye on the size of data you are writing to memory. I've crashed machines not being diligent about what is in memory and clearing unneeded data. 64-bit geoprocessing won't help you when trying to commit 64GB of data to memory.

x2!

Here's a def I use in most of my Python tools/implementations that I am using in_memory. I will insert a call to this before executing the rest of the code.


def clearINMEM():
   """ clear out the IN_MEMORY workspace of any featureclasses, rasters and tables """
   try:
   
     arcpy.env.workspace = "IN_MEMORY"
     
     fcs = arcpy.ListFeatureClasses()
     tabs = arcpy.ListTables()
     rasters = arcpy.ListRasters()
     
     ### for each FeatClass in the list of fcs's, delete it.
     for f in fcs:
        arcpy.Delete_management(f)
        arcpy.AddMessage("deleted: " + f)
        
     ### for each TableClass in the list of tab's, delete it.
     for t in tabs:
        arcpy.Delete_management(t)
        arcpy.AddMessage("deleted: " + t)
             
     ### for each Raster in the workspace, delete it
     for r in rasters:
        arcpy.Delete_management(r)
        arcpy.AddMessage("deleted " + str(r))              
     
   except:
     arcpy.AddMessage("The following error(s) occured attempting to clear WS " + arcpy.GetMessages(2))
     return

FatihDur · ‎02-09-2016

I know it is an old post but worth mentioning that arcpy.Delete_mamangement("in_memory") does all for you.

RebeccaStrauch__GISP · ‎02-15-2016

Thanks for the tip Faith, I'm going to use this today.

Just fyi, yo may want to edit your post for a typo

Fatih Dur wrote:
I know it is an old post but worth mentioning that arcpy.Delete_mamangement("in_memory") does all for you.

Should be

arcpy.Delete_management("in_memory")

Edit: for what it's worth, at least in 10.3.1 James Crandall 's script seems to be working more reliably for me....but then again, I still don't know that I'm using in_memory correctly.

StacyRendall1 · ‎10-23-2013

Typically a computer will read data from disk into RAM, select from RAM and pass to CPU for calculation, the result is passed back to RAM, then it is written to disk. The computer pretty much does this when you do any calculation (note that database management systems, including ArcGIS, are heavily optimized to try and do this as fast as possible, and will not load the whole dataset into memory at one time unnecessarily). The advantage of in_memory is that operations in RAM are very fast, compared to reading/writing from a hard disk, and that you can skip some disk writes/reads if you are doing sequential calculations on a set of data.

The more calculation steps you have between reading and writing, the greater the potential benefit of using in_memory. However, if you are only doing one calculation, using in_memory may be no faster (and possibly slower, given the additional steps) than calculating directly.

I understand that in_memory is good until it's all used up and then it's super slow.

Correct. If your computer has 2GB of RAM installed (physical memory), about 1.5 might be available for processes to use. Sometimes this amount won't be enough, so modern operating systems take some disk space to be used as extra memory in emergencies, called virtual memory, which is stored in a Page file/Swap file. The idea is that it is better for your computer to slow down than crash if you put too much data into memory. Virtual memory is incredibly slow because it really isn't optimized like the original file was on the disk. Given the large number of read/writes required in any calculation it will usually completely choke up your computer.

Now, Solid State Drives (SSDs) are a lot faster to read/write than HDDs, so the advantages of in_memory may completely disappear if you have a fast SSD...

The number of features you are dealing with (130,000) is not particularly high, but it depends on the type and number of attributes you have associated with each feature. Obviously the amount of RAM you have will make a difference, and whether or not your setup is 64bit - this requires a 64bit operating system and ArcGIS 64bit background geoprocessing.

GeoffOlson · ‎10-24-2013

Also, I was working with arcpy.SelectLayerByAttribute_management, this that function have to be performed within ArcMap? It seemed like once I changed my expression from the shapefile location to the layer name it worked. Otherwise I received an error saying it can't select a feature class. That would make sense though because it is Select LAYER, not Select Featureclass.

GeoffOlson · ‎10-24-2013

I tried adding

arcpy.env.workspace = "in_memory"

near the top of my script, hoping it would put everything in memory, but it doesn't seem any faster. I must have to change all my cursors and selections, too.

GeoffOlson · ‎10-24-2013

Holy Cow! I copied the two layers I'm working with to new features "in_memory", then changed all layer references from the real layers to the memory layers, and instead of 10 hours, it's looking more like 45 minutes, assuming the RAM isn't exceeded.

GeoffOlson · ‎10-24-2013

So it can run much much faster, but when I try to have my script create the layer copies to RAM, I get an error because the script doesn't add the layers to the TOC, so when the line setting the RAM layers as variables tries to run, it can't and it says the layer name is out of range. Is there a special way to carry out a "CopyFeatures_management" and have it usable in the script? Here's my layer set up:

mxd = arcpy.mapping.MapDocument("Current")
arcpy.env.workspace = "in_memory"
mapLyr1 = arcpy.mapping.ListLayers(mxd, "NEW_BiState_Grid400_IowaSP") [0]
mapLyr2 = arcpy.mapping.ListLayers(mxd, "NEW_BiState_Grid100_IowaSP") [0]
arcpy.CopyFeatures_management(mapLyr1, "in_memory\pyLyr1")
arcpy.CopyFeatures_management(mapLyr2, "in_memory\pyLyr2")
pyLyr1 = arcpy.mapping.ListLayers(mxd, "pyLyr1") [0]
pyLyr2 = arcpy.mapping.ListLayers(mxd, "pyLyr2") [0]

I tried and to refresh the active view, but that didn't work. This works fine it I copy and paste everything into the Python window instead of running it as a tool, although it's much slower.