Classify pixels using deep learning with resnet50, GPU memory issues?

662
8
02-10-2021 11:39 PM
TimGrenside
New Contributor III

Im running ArcGIS 2.7.1 and having issues with Classify Pixels Using Deep Learning.  It runs fine when running it through the ArcGIS Pro greprocessing tool.  But if I go to the History of the job just run and click “Send to Python Window”, the python command checkboards the output raster (below image).  I think I have tracked it down to possible GPU memory issues.  The ArcGIS geoprocess and python must be using different code, as from the Task Manager, the GPU memory signature is completely different and in fact the python code doesn’t release the Dedicate GPU memory when its finished (see below).

Additional information:

- The model is running resnet50, unet with fastai (resnet34 doesn’t seem to have this issue and works fine in python)

-The GPU is a GeForce RTX 2080 Ti  (so 11 GBs)

- Changing the batch size to 1 doesn’t make a difference

- I think the python code might be crashing, but not reporting any issues to ArcGIS.

Has anyone managed to get similar to above working with resnet50 in python?

Does anybody have any ideas on what I could try to get it working?

Many Thanks

Tim

 

GPU task manager when running through ArcGIS geoprocess tool

TimGrenside_1-1613028449866.png

GPU task manager when running through python window in ArcGIS

TimGrenside_2-1613028460715.png

Checkboard pattern

TimGrenside_0-1613028418875.png

 

 

0 Kudos
8 Replies
DanPatterson
MVP Notable Contributor
0 Kudos
TimGrenside
New Contributor III

Wait what, yes I am using that.  I thought that was the latest as it was updated with ArcGIS pro 2.7   What should I be using instead?

Many thanks

0 Kudos
SandeepKumar1
Esri Contributor

Hi Tim,

Do you see that same issue if you run the tool again from history ? (Not the python command but the tool itself).

 

Thanks,

Sandeep

0 Kudos
TimGrenside
New Contributor III

Hi Sandeep

Yes - once the process checkerboards after a python run - the 'History' (or original Geoprocess tool) will not work, until I close and restart ArcGIS Pro.  So it is like once it checkboards the only way to get it to work again is too restart ArcGIS Pro.

Many thanks

Tim

 

0 Kudos
SandeepKumar1
Esri Contributor

Hi Tim,

Based on your comments I fell that it is a GPU memory issue, Can you try reducing your batch_size while inferencing.

Thanks,

Sandeep

0 Kudos
TimGrenside
New Contributor III

Hi Sandeep

Thanks for your response.  If I change the batch size to 1 and run the python code it still checkerboards.  It still feels like it is running out of memory, is there a way to see any python/ArcGIS logs?

Regards

Tim

0 Kudos
AngusHooper1
Occasional Contributor II

Might be a stretch but could be resolved with a similar fix to https://community.esri.com/t5/arcgis-spatial-analyst-blog/are-you-getting-gpu-error-while-executing-...

The default value of 2 seconds in the Windows timeout detection and recovery delay can cause the OS to reboot the GPU which will crash whatever processes are using it.

0 Kudos
TimGrenside
New Contributor III

Hi Angus,
Thanks for your response. I added the registry setting and environment variable CUDA_VISIBLE_DEVICES, but it doesn't seem to make a difference and the issue is still happening.
Regards
Tim

0 Kudos