What are the state of the art frameworks in ML programming area? Similar to what React is for web and tailwind for CSS
Triton, ONNX, JAX, PyTorch, cublass, .....
I know they might be for different purposes, but having some idea what is for what and when to use would be helpful
Everything after "Pipelining GEMM with TMA" (inclusive) is specific to NVIDIA. Which is fine but the title (of the guide itself) is clearly misleading.
I spent months, months of late nights watching commits to nvfuser and shit, I wrote a SASS decompiler instrumented everything trying to learn Blackwell.
This is the first time I've seen something so clean, just a real work of scholarship on it.
My hat is off to the authors and the contribution it represents.
If I would caution a reader anything it's that the 2CTA (sm_100 sm_110) patterns here are different on 1CTA in important ways and it's not a better / worse thing, they are good for different workloads.
Really outstanding work. I proves q lot of this in lean4 and published but I got lazy short of really doing the pedagogical work.
This is what you should be starting with if you want to max out 2CTA gear, it's immaculate.