FRESH

Hacker News

Basic Facts about GPUs

341 points by ibobev

by b0a04gl

5 subcomments

by elashri

2 subcomments

Good article summarizing good chunk of information that people should have some idea about. I just want to comment that the title is a little bit misleading because this is talking about the very choices that NVIDIA follows in developing their GPU archs which is not what always what others do.
For example, the arithmetic intensity break-even point (ridge-point) is very different once you leave the NVIDIA-land. If we take AMD Instinct MI300, it has up to 160 TFLOPS FP32 paired with ~6 TB/s of HBM3/3E bandwidth gives a ridge-point near 27 FLOPs/byte which is about double that of the A100’s 13 FLOPs/byte. The larger on-package HBM (128 – 256 GB) GPU memory also shifts the practical trade-offs between tiling depth and occupancy. Although this is very expensive and does not have CUDA (which can be good and bad at the same time).

by eapriv

1 subcomments

Spoiler: it’s not about how GPUs work, it’s about how to use them for machine learning computations.

by LarsDu88

1 subcomments

Maybe this should be titled "Basic Facts about Nvidia GPUs" as the WARP terminology is a feature of modern Nvidia GPUs.
Again, I emphasize "modern"
An NVIDIA GPU from circa 2003 is completely different and has baked in circuitry specific to the rendering pipelines used for videogames at that time.
So most of this post is not quite general to all "GPUs" which a much broader category of devices that don't necessarily encompass the type of general purpose computation we use modern Nvidia GPUs for.

by saagarjha

0 subcomment

> The “Peak Compute” roof of 19.5 TFLOPS is an ideal, achievable only with highly optimized instructions like Tensor Core matrix multiplications and high enough power limits.
As mentioned below, 19.5 TFLOPS is the FP32 compute roofline, which doesn't support Tensor Cores. If you want to use those you need to use FP16 and you can get substantially improved performance.

by Agentlien

0 subcomment

I wasn't expecting the strong CUDA/ML focus. My own work is primarily in graphics and performance in video games; while this is all familiar and useful it feels like a very different view of the hardware than mine.

by SoftTalker

4 subcomments

by gdiamos

0 subcomment

Wow - the title is "basic facts" - but it should be "key insights"
You wouldn't believe how many PhDs I've met who have no idea what a roofline is.

by bjornsing

0 subcomment

So how are we doing with whole program optimization on the compiler level? Feels kind of backwards that people are optimizing these LLM architectures, one at a time.

by kittikitti

0 subcomment

This is a really good introduction and I appreciate it. When I was building my AI PC, the deep dive research into GPU's took a few days but this lays it out in front of me. It's especially great because it touches on high-value applications like generative artificial intelligence. A notable diagram from the page that I wasn't able to find represented well elsewhere was the memory hierarchy of the A100 GPU's. The diagrams were very helpful. Thank you for this!

by geoffbp

0 subcomment

by naganotonicbuy

0 subcomment

by neuroelectron

0 subcomment