Still it's a good article and nice to see the old anandtech crew together. The random grammatical errors are still there but these days they are a reassuring sign the article was written by hand.
From what I understand, in a typical gpu core you put logic and connectors on one side and innert silicon on the other. So unless you drill through silicon you don't get shorter routing.
Why not put GPU one one side and HBM on the other side of the PCB? This would fix the cooling problem?
Hyperscalers are dealing with a pretty complex Pareto envelope that includes power (total), power (density), volume of space available, token throughput and token latency.
My guess is that there’s going to be some heterogenous compute deployed possibly forever, but likely for at least the next six to ten years, and exotic fragile underclocked highly dense compute as imagined in the paper is likely to be part of that. But probably not all of it.
Either way as a society we’ll get the benefits of at least a trillion dollars of R&D and production on silicon, which is great.