Hi,
We have installed Image Server 11.1 & Raster Analytics in a NVIDIA DGX machine with 4 GPU cards.
How can we leverage all 4 GPUs while training the model in Deep Learning Studio?
Thanks & regards,
Saranya
Solved! Go to Solution.
Hi Saranya,
At 11.1, when it gets to train model using a RA site in an enterprise environment, each image server node within the Raster Analytics site can still use one and only one GPU. Deep Learning Studio will also carry this limitation since it's leveraging the RA site for processing. We are looking to remove this limitation in the future release.
-Jay
Hi Saranya,
At 11.1, when it gets to train model using a RA site in an enterprise environment, each image server node within the Raster Analytics site can still use one and only one GPU. Deep Learning Studio will also carry this limitation since it's leveraging the RA site for processing. We are looking to remove this limitation in the future release.
-Jay
Hi Saranya,
We are working with Python API team on this, starting with supporting multiple GPUs on a single image server node for some of the model types. Unfortunately we don't yet have a clear timeline when this will be available for most of the model types. We do undertsdand that this is essential for some of the customers, especially when it goes to the model training. Once it becomes available, the Deep Learning Studio will natually support it too. Thank you for your patience.
-Jay
Thanks for the update Jay!
Hi,
I have a follow up question.
In this page, it says if I have multiple GPUs, I can run multiple instances. Does this mean if I have 16GPUs, I can run inferences on 16 images in parallel? One image per GPU?
Supporting multiple GPUs on your raster analytics node within your raster analytics site is not fully supported for all models yet. `Utilization of multiple GPUs per server node is applicable to some deep learning model configurations predefined in ArcGIS. They include Tensorflow (ObjectDetectionAPI and DeepLab), Keras (MaskRCNN), and PyTorch.`
For supported models, yes the job can be distributed to multiple GPUs on your RA node if there are multiple GPUs available and the Max instances per machine for RA services are set to be more than 1.
Hi JayChen,
Thanks for your reply. Does the same apply for inferencing as well?
Yes. This should be on inferencing side.