FRESH

Hacker News

Home

Why CUDA translation wont unlock AMD

87 points by JonChesterfield

by lvl155

1 subcomments

Let’s just say what it is: devs are too constrained to jump ship right now. It’s a massive land grab and you are not going to spend time tinkering with CUDA alternatives when even a six-month delay can basically kill your company/organization. Google and Apple are two companies with enough resources to do it. Google isn’t because they’re keeping it proprietary to their cloud. Apple still have their heads stuck in sand barely capable of fixing Siri.

by mandevil

1 subcomments

Yeah, ROCm focused code will always beat generic code compiled down. But this is a really difficult game to win.
For example, Deepseek R-1 released optimized for running on Nvidia HW, and needed some adaption to run as well on ROCm. This was for the exact same reasons that ROCm code will beat generic code compiled into ROCm, in the same way. Basically the Deepseek team, for their own purposes, created R-1 to fit Nvidia's way of doing things (because Nvidia is market-dominant) on their own. Once they released, someone like Elio or AMD would have to do the work of adapting the code to run best on ROCm.
For more established players who weren't out-of-left-field surprises like Deepseek, e.g. Meta's Llama series, mostly coordinate with AMD ahead of release day, but I suspect that AMD still has to pay for the engineering work themselves while Meta does the work to make it run on Nvidia themselves. This simple fact, that every researcher makes their stuff work on CUDA themselves, but AMD or someone like Elio has to do the work to move it over to get it to be as performant on ROCm, that is what keeps people in the CUDA universe.

by buggyworld

2 subcomments

This reminds me of the database wire protocol debates. PostgreSQL-compatible databases (like Aurora, Neon, Supabase) achieve compatibility by speaking the Postgres wire protocol, but the truly successful ones don't just translate—they rebuild core components to leverage their own architecture (Aurora's storage layer, Neon's branching, etc.).
The article frames this as "CUDA translation bad, AMD-native good" but misses the strategic value of compatibility layers: they lower switching costs and expand the addressable market. NVIDIA's moat isn't just technical—it's the ecosystem inertia. A translation layer that gets 80% of NVIDIA performance might be enough to get developers to try AMD, at which point AMD-native optimization becomes worth the investment.
The article is essentially a product pitch for Paiton disguised as technical analysis. The real question isn't "should AMD hardware pretend to be CUDA?" but rather "what's the minimum viable compatibility needed to overcome ecosystem lock-in?" PostgreSQL didn't win by being incompatible—it won by being good AND having a clear migration path from proprietary databases.

by jfalcon

0 subcomment

CUDA isn't all that and a bag of chips. It just is the Facebook/Twitter of the data science and from that LLM space. There are Tensor processors and other ASIC processing for specific compute functions that can give Nvidia a challenge but it's not unknown to every gamer that there has always been a performance difference between Nvidia and AMD/ATI.
Ok, point made Nvidia. Kudos.
ATI had their moment in the sun before ASIC ate their cryptocurrency lunch. So both still had/have relevance outside gaming. But, I see Intel is starting to take GPU space seriously and they shouldn't be ruled out.
And as mentioned elsewhere in the comments, there is Vulkan. There is also this idea of virtualized GPU as now the bottleneck isn't CPU... it's now GPU. As I mentioned there are Tensors, Moore's Law thresholds coming back again with 1 nanometer manufacturing, there is going to be a point where we hit a threshold again with current chips and we will have a change in technology - again.
So while Nvidia is living the life - unless they have a crystal ball of how tensors are going to go that they can move CUDA towards, there is going to be a "co-processor" future coming up and with that the next step towards NPUs will be taken. This is where Apple is aligning itself because, after all, they had the money and just said "Nope, we'll license this round out..."
AMD isn't out yet. They, along with Intel and others, just need to figure out where the next bottlenecks are and build those toll bridges.

by fulafel

1 subcomments

Vulkan Compute is catching up with HIP (or whatever the compatibility stuff is called now), which seems like a welcome break from CUDA - in this benchmark it beats CUDA in some benchmarks on AMD: https://www.phoronix.com/review/rocm-71-llama-cpp-vulkan

by manjose2018

3 subcomments

https://geohot.github.io//blog/jekyll/update/2025/03/08/AMD-...
https://tinygrad.org/ is the only viable alternative to CUDA that I have seen popup in the past few years.

by apitman

0 subcomment

Our open source library is currently hard locked into CUDA due to nvCOMP for gzip decompression (bioinformatics files). What I wouldn't give for an open source implementation, especially if it targeted WebGPU.

by martinald

0 subcomment

Perhaps I'm misunderstanding the market dynamics; but isn't AMDs real opp inference over research?
Training etc still happens on NVDA but inference is somewhat easy to do on vLLM et al with a true ROCm backend with little effort?

by kj4ips

0 subcomment

I agree pretty strongly. A translation layer like this is making an intentional trade: Giving up performance and HW alignment for less lead time and effort to make a proper port.

by outside1234

2 subcomments

Are the hyperscalers really using CUDA? This is what really matters. We know Google isn't. Are AWS and Azure for their hosting of OpenAI models et al?

by jmward01

0 subcomment

Right now we need diversity in the ecosystem. AMD is finally getting mature and hopefully that will lead to them truly getting a second, strong, opinion into ecosystem. The friction this article talks about is needed to push new ideas.

by doctorpangloss

0 subcomment

All they have to do is release air cooled 96GB GDDR7 PCIe5 boards with 4x Infinity Link, and charge $1,900 for it.

by latchkey

1 subcomments

A bit of background. This is directed towards Spectral Compute (Michael) and https://scale-lang.com/. I know both of these guys personally and consider them both good friends, so you have to understand a bit of the background in order to really dive into this.
My take on it is fairly well summed up at the bottom of Elio's post. In essence, Elio is taking the view of "we would never use scale-lang for llms because we have a product that is native AMD" and Michael is taking the view of "there is a ton of CUDA code out there that isn't just AI and we can help move those people over to AMD... oh and by the way, we actually do know what we are doing, and we think we have a good chance at making this perform."
At the end of the day, both companies (my friends) are trying to make AMD a viable solution in a world dominated by an ever growing monopoly. Stepping back a bit and looking at the larger picture, I feel this is fantastic and want to support both of them in their efforts.

by measurablefunc

8 subcomments

[flagged]

by pixelpoet

1 subcomments

Actual article title says "won't"; wont is a word meaning habit or proclivity.