FRESH
Hacker News
Home
Matrix Core Programming on AMD CDNA Architecture
61 points by salykova
by phkahler
3 subcomments
So from CDNA3 to 4 they doubled fp16 and fp8 performance but cut fp32 and fp64 by half?
Wonder why the regression on non-AI workloads?
by saagarjha
1 subcomments
If AMD were serious they would show a fully-worked out GEMM, not just "here is our theoretical performance, this is the instruction to use".