FRESH

Hacker News

Matrix Core Programming on AMD CDNA Architecture

61 points by salykova

by phkahler

3 subcomments

So from CDNA3 to 4 they doubled fp16 and fp8 performance but cut fp32 and fp64 by half?
Wonder why the regression on non-AI workloads?

by saagarjha

1 subcomments

If AMD were serious they would show a fully-worked out GEMM, not just "here is our theoretical performance, this is the instruction to use".