Prioritizing CUDA core count vs. VRAM capacity in GPU selection for Deep Learning

RyanWalter

Hi all,

I'm looking into adding an additional GPU to my home desktop for the sole purpose Classifying Point Clouds, Objects, and other items using Deep Learning. I currently run a bit of an unconventional set up, as my home desktop is my personal computer that I also use for work. I have an AMD 7900 XT as my main GPU, but that is not helpful to me for Deep Learning because it isn't CUDA compatible. As a side note and separate question, has anyone gotten Deep Learning working with ZLUDA? I would love to leverage my 7900 XT for Deep Learning.

I also have, in the same rig, a Nvidia Tesla P40 I was able to grab for about $175. This is a Pascal generation Nvidia card running CUDA version 12.4. I have this connected to my PCIE x8 slot below my 7900 XT. The Tesla P40 has 24GB of GDDR5 memory and 3840 CUDA cores. It has been a little annoying to cool, but has otherwise served well as an affordable, high VRAM, CUDA compatible card for doing DL detection/classification. I'm doing a lot more Deep Learning work these days, right now focusing on classifying objects in LiDAR point clouds, and it's yielding the results I want. It does, however, take a long time for classification to complete. My employer agrees this work is worth exploring more and is looking to financially support me in purchasing a different GPU to speed up the process (within reason, let's say $1200 as a ceiling).

I like to think I'm not the world's biggest idiot when it comes to computers, but on this I'm totally lost. When looking for a GPU specifically for Deep Learning, what should my priorities be? I can see in my modeling I'm utilizing most of my VRAM, and obviously that is important, but how should I think about CUDA core counts? Does the logic "the more the better" check out here? Or should I focus on card with newer architectures? What about clock speeds? Are there general rules of thumbs for GPU specifications that Deep Learning can leverage? As an example: the RTX 4000 Ada has 6144 CUDA cores and 20GB of VRAM for $1500, while the the 4070 Ti Super has 8448 CUDA cores and 16GB of VRAM for around half the price of $800; how would these compare for Deep Learning? Is there anything that Deep Learning leverages in either of these cards besides those CUDA cores and VRAM amounts that I'm not accounting for?

I know there are a lot of questions in there. I appreciate the help. Thanks!