Clearing rasters from in_memory doesn't work

302
9
05-28-2020 12:17 PM
AbelPerez
Occasional Contributor

ArcMap/ArcObjects 10.8 with VB2019 compiled against .NET Framework 4.6

I have the need to process a very large amount of polylines against a raster using the ExtarctByMask gp. Saving the result sub-raster to memory is up to 5x faster than saving to disk so I opt to use the in_memory workspace.

However RAM usage keeps building because the GP functions do not clear up these in_memory objects. My pseudo code looks something like this:

for every polylineFeat

   'extract a sub-raster as intersection of input raster and polyline feature

   Dim extBymsk As ExtractByMask = New ExtractByMask
   extBymsk.in_raster = inRastName

   extBymsk.in_mask_data = polylineFeat

   extBymsk.out_raster = "in_memory\" & subRasterName

   

   'get properties of the sub-raster

   Dim rastProps As GetRasterProperties = New GetRasterProperties

   'done with the sub-raster so delete it

   Dim delData As Delete = New Delete
   delData.in_data = "in_memory\" & subRasterNamer

  

   every 100 iterations delete in_memory workspace

         Dim delWksp As Delete = New Delete
         delWksp.in_data = "in_memory"

          'garbage collect and wait

          GC.Collect()
          GC.WaitForPendingFinalizers()

    end every 100 iterations

    

    'test to see if the sub-raster is truly gone from memory

     Dim rastPropsB As GetRasterProperties = New GetRasterProperties

    

next polylineFeat

The Delete_management for rasters in_memory just does not work. How do I know this? I put a test GP function GetRasterProperties AFTER the Delete operation and i can still get a result. It should throw an exception if the raster didn't exist.

So I have about 7000 polylines to process and this routine throws an exception at about 2800 polylines. The reason is that RAM has been saturated for this thread.

Has anyone else seen this? What have you done as a workaround? I contacted tech support and they say they will file a bug. But I need to process data today and the last bug I filed took 18 months to fix. I cant believe we are at 10.8 and the in_memory workspace still has defects.

Any help appreciated.

~Abel

9 Replies
DuncanHornby
MVP Frequent Contributor

There is a lot of discussion on this thread about IN_MEMORY workspace accessed via arcpy. Dan's comment about simply overwriting seemed like a simple choice, its that an option for you, to simply overwrite the same raster name?

I known IGPUtilities has a releaseinternals method but in true ESRI API help style it has next no explanation on what it actually does...If anything!

Personally when it comes to rasters I tend to read/write to TIF format and accept any loss of performance over stability.

Setting processing extents can often reduce data volume?

Reply
0 Kudos
AbelPerez
Occasional Contributor

Let me prepend my response by saying ESRI tech support and I have been testing like a hundred different strategies for the last 4 weeks. Nothing worked as the lowest common denominator was always the in_memory workspace. They ended up filing a bug and hopefully that will get fixed, in what a year or two.

So thanks for the link. I recognize that thread as I went through almost every post on in_memory. Like the suggestion I tried the GP overwrite and the result is an exception. To me that means the raster is being locked in the in_memory workspace so neither deleting it or over-writing it works.

Ill take a look at IGPUtilities. At this point I'm willing to try anything.

The ExtractByMask is just taking the intersection of a polyline and a raster. Not much I can do with extent of format. I do know that writing to disk works but you lose performance. In my case it turns a 15 minute process (in_memory) to many hours (to_disk). so that is not an option.

The other issue here also is that the in_memory workspace is per ArcMap session so no matter what you do the RAM usage just keeps building. When you close ArcMap it releases that memory so there must be a way to do that manually.

DuncanHornby
MVP Frequent Contributor

You say when its writing to the hard drive it can take hours. So for I and others to understand what the inputs are, can you explain what is the raster? Is is some monster global dataset? How many rows/columns and cell size? Are these lines like a road network or animal tracks weaving all over this "global" raster? I think you should add a few pictures? For extract by mask to take hours it implies a very fine resolution raster composed of millions/billions of cells?

So if in_memory is basically broken it sounds like another approach is needed: divide and conquer. So assuming your lines say represent roads, I would code them up into an appropriate  group such as district/county. Then loop over that group ID, select all the roads for that group ID, get the extent of the selection, use that as your processing extent and run the extract by mask. You then continue with you workflow at the group level and then summarise at the end?

There is the Dice tool, worth looking at?

Reply
0 Kudos
AbelPerez
Occasional Contributor

The raster is not the issue here. The issue is the number of extracts that I need to do which could be 7000 to 12000 polylines. The requirement is NOT to group the polylines and then get the extract for the entire intersection. The requirement is to get the INDIVIDUAL extracts from each polyline. Take a look at my pseudo code.

To do this anywhere near an acceptable time frame you must save to in_memory. But if the in_memory workspace isn't being cleared with a Delete_management then RAM usage just keeps building and eventually throws an exception.

Reply
0 Kudos
DuncanHornby
MVP Frequent Contributor

Have you tried setting the processing extent to the polyline before you make the call to the extract tool? Have you visually confirmed that after an extract happens it really is a tiny raster the size of the polyline extent?

Three other workflows you could consider?

  • Create a RAMDISK, I've done that in the past but with the introduction of in_memory I stopped using that approach, here is one such article.
  • I know this code is VB but if it is just a matter of processing data for a client rather than tool development, I have successfully used the multiprocessing module in python in an arcpy script to parallielize processing. Yes I'm creating rasters on a hard disk (so slows read/writes) but then I've 8 cores processing simultaneously! Machine does sound like its going to blow up  and I have had to keep an eye on the processing as dead lock keeps happening so you need to build into your code a start from last position approach.
  • Wait another decade before ESRI fix it...
Reply
0 Kudos
AbelPerez
Occasional Contributor

Yes I have confirmed that the extracted subraster is the INTERSECTION of said polyline and the main raster. I guess you could say that the subraster is the extents of the polyline with most of the cells being Null since only the cells that intersect the polyline are actually extracted.

Not sure what setting the environment to the extents of the polyline will do as the end result is still the same. Namely I have an extracted subraster in memory. That subraster is what is not cleared from memory.

Thought about a RAMDISK but that involves each of my users setting that up and setting that as my temp output workspace in code. It could work but has some challenges with end users.

Also gave some thought about saving to disk and doing a multi-threaded approach. That could also work but has some challenges. Namely me. I have done MT in the past but wanted to keep my code as simple as possible. Definitely something to look at if ESRI drags their feet.

AbelPerez
Occasional Contributor

Another tidbit of information after some more testing I noticed is that the process works much better WITHOUT the Delete_management.


My testing process is as follows:

1. ExtractByMask
2. GetRasterProperties
3. Delete_management

For my test I use the same output raster name "in_memory\rasOutput" for each iteration and GP.OverwriteOutput = True.

When it gets to the second iteration I get an exception. If I omit step #3 Delete_management then I do NOT get the exception.


The theory on this is that Delete_management is broken as well. It seems like it attempts to delete the subraster but can’t so it just keeps a lock on it.

Here is the debug info when I INCLUDE the Delete_management. The exception happens at GP.Execute of ExtractByMask If I omit this Delete_management step then I do not get an exception and the code runs to completion. Oddly, it finishes but still uses about 1.2GB or RAM even after using the same output raster location and name.

System.Runtime.InteropServices.COMException (0x80004005): Error HRESULT E_FAIL has been returned from a call to a COM component.
at ESRI.ArcGIS.Geoprocessing.GeoProcessorClass.Execute(String Name, IVariantArray ipValues, ITrackCancel pTrackCancel)
at ESRI.ArcGIS.Geoprocessor.Geoprocessor.ExecuteInner(IGPProcess process, ITrackCancel trackCancel, IGeoProcessor igp, IVariantArray iva)
at ESRI.ArcGIS.Geoprocessor.Geoprocessor.Execute(IGPProcess process, ITrackCancel trackCancel)

DuncanHornby
MVP Frequent Contributor

So by omitting the Delete step and allowing overwrite the code completes and runs fast in_memory? Nice one!

Reply
0 Kudos
AbelPerez
Occasional Contributor

Yes all my testing is to get in_memory to work. This approach works but the only downside is that RAM is still not cleared with every successive over-write. Although no exception occurs. I think that it could be limited by the user's RAM so we will see what happens on a bigger dataset.

Reply
0 Kudos