FRESH

Hacker News

Home

Taking on CUDA with ROCm: 'One Step After Another'

249 points by mindcrime

by lrvick

5 subcomments

Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.
It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.
Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.

by androiddrew

1 subcomments

I have been trying since February to get someone at AMD to shipped tuned Tensile kernels in the rcom-libs for the gfx1201. They are used by Ollama but no one on the Developer Discord knows who is responsible for that. It has been pretty frustrating and it shows that AMD has an organizational problem to overcome in addition to all the things technically that they want rocm to do.

by 0xbadcafebee

5 subcomments

AMD has years of catching up to do with ROCm just to get their devices to work well. They don't support all their own graphics cards that can do AI, and when it is supported, it's buggy. The AMDGPU graphics driver for Linux has had continued instability since 6.6. I don't understand why they can't hire better software engineers.

by StillBored

0 subcomment

I just wish they would make another pass at cleaning up the stack. It should be easy to `git clone --recurse-submodules rocm` followed by a configure/make that both prints out missing dependencies and configures without them, along with choices for 'build the world' vs just build some lower level opencl/HIP/SPIRV tooling without all the libraries/etc on top in a clear way.
Right now the entire source base is literally throw a bunch of crap into the rocm brand and hope it builds together vs some overarching architecture. Presumably the entire spend it also tied to "whatever big Co's evaluation needs this week" when it comes to developing with it.

by AshamedCaptain

1 subcomments

> Last year, AMD ran a GitHub poll for ROCm complaints and received more than 1,000 responses. Many were around supporting older hardware, which is today supported either by AMD or by the community, and one year on, all 1,000 complaints have been addressed, Elangovan said.
Must have been by waiting for each of the 1000 complainers to die of old age, because I do not know what old hardware they have added support for.

by grokcodec

1 subcomments

The day ROCm supports EVERY AMD card on release, just like CUDA does, is the day I will actually believe this marketing hype.They really dropped the ball here, also when they abandoned recently released cards (at the time) like the 400 series. Hopefully management gets their heads out of their butts and invests more in the software stack.

by mstaoru

0 subcomment

I'm team "taking on CUDA with OpenVINO" (and SYCL*), Intel seems really upped their game on iGPU and dGPU lately, with sane prices and fairly good software support and APIs.
I'm not talking gaming CUDA, but CV and data science workloads seem to scale well on Arc and work well on Edge on Core Ultra 2/3.

by mellosouls

0 subcomment

Related from Jan 2025:
ROCm Device Support Wishlist (205 points, 107 comments)
https://news.ycombinator.com/item?id=42772170

by rdevilla

6 subcomments

ROCm is not supported on some very common consumer GPUs, e.g. the RX 580. Vulkan backends work just fine.

by adev_

3 subcomments

A little feedback to AMD executives about the current status of ROCm here:
(1) - Supporting only Server grade hardware and ignoring laptop/consumer grade GPU/APU for ROCm was a terrible strategical mistake.
A lot of developers experiments first and foremost on their personal laptop first and scale on expensive, professional grade hardware later. In addition: some developers simply do not have the money to buy server grade hardware.
By locking ROCm only to server grade GPUs, you restrict the potential list of contributors to your OSS ROCm ecosystem to few large AI users and few HPC centers... Meaning virtually nobody.
A much more sensible strategy would be to provide degraded performance for ROCm on top of consummer GPUs, and this is exactly what Nvidia with CUDA is doing.
This is changing but you need to send a clear message there. EVERY new released device should be properly supported by ROCm.
- (2) Supporting only the two last generations of architecture is not what customers want to see.
https://rocm.docs.amd.com/projects/install-on-linux/en/docs-...
People with existing GPU codebase invests significant amount of effort to support ROCm.
Saying them two years later: "Sorry you are out of update now!" when the ecosystem is still unstable is unacceptable.
CUDA excels to backward compatibility. The fact you ignore it entirely plays against you.
(3) - Focusing exclusively on Triton and making HIP a second class citizen is non-sensical.
AI might get all the buzz and the money right now, we go it.
It might look sensible on the surface to focus on Python-base, AI focused, tools like Triton and supporting them is definitively necessary.
But there is a tremendous amount of code that is relying on C++ and C to run over GPU (HPC, simulation, scientific, imaging, ....) and that will remain there for the multiple decades to come.
Ignoring that is loosing, again, custumers to CUDA.
It is currently pretty ironic to see such a move like that considering that AMD GPUs currently tend to be highly competitive over FP64, meaning good for these kind of applications. You are throwing away one of your own competitive advantage...
(4) - Last but not least: Please focus a bit on the packaging of your software solution.
There has been complained on this for the last 5 years and not much changed.
Working with distributions packagers and integrating with them does not cost much... This would currently give you a competitive advantage over Nvidia..

by bruce343434

3 subcomments

In my experience fiddling with compute shaders a long time ago, cuda and rocm and opencv are way too much hassle to set up. Usually it takes a few hours to get the toolkits and SDK up and running that is, if you CAN get it up and running. The dependencies are way too big as well, cuda is 11gb??? Either way, just use Vulkan. Vulkan "just works" and doesn't lock you into Nvidia/amd.

by p1esk

3 subcomments

Someone from AMD posted this a few minutes ago, then deleted it:
"Anush's success is due to opting out of internal bureaucracy than anything else. most Claude use at AMD goes through internal infrastructure that can take hundreds of seconds per response due to throttling. Anush got us an exemption to use Anthropic directly. he is also exempt from normal policies on open source and so I can directly contribute to projects to add AMD support. He's an effective leader and has turned ROCm into a internal startup based in California. Definitely worth joining the team even if you've heard bad things about AMD as a whole."
This kind of bullshit is why I don't want to join AMD, even if this particular team is temporarily exempt from it.

by nullpoint420

0 subcomment

I just don’t understand how they haven’t figured this out yet. I genuinely want to know the corporate structure and politics that have lead to their inability to execute.
Is it leadership? Something else?

by jmward01

1 subcomments

I really want to get to the point that I am looking online for a GPU and Nvidia isn't the requirement. I think we are really close to there. Maybe we are there and my level of trust just needs to bump up.

by suprjami

2 subcomments

Just in time for Vulkan tg to be faster in almost all situations, and Vulkan pp to be faster in many situations with constant improvements on the way, making ROCm obsolete for inference.

by taherchhabra

1 subcomments

Genuine question. After claude code, codex etc, can't this be speedup ?

by pjmlp

0 subcomment

They need lots of steps, hardware support, IDE and graphical debugging integrations , the polyglot ecosystem, having a common bytecode used by several compiler backends (CUDA is not only C++), the libraries portfolio.

by superkuh

6 subcomments

AMD hasn't signaled in behavior or words that they're going to actually support ROCm on $specificdevice for more than 4-5 years after release. Sometimes it's as little as the high 3.x years for shrinks like the consumer AMD RX 580. And often the ROCm support for consumer devices isn't out until a year after release, further cutting into that window.
Meanwhile nvidia just dropped CUDA/driver support for 1xxx series cards from their most recent drivers this year.
For me ROCm's mayfly lifetime is a dealbreaker.

by roenxi

0 subcomment

> Challenger AMD’s ability to take data center GPU share from market leader Nvidia will certainly depend on the success or failure of its AI software stack, ROCm.
I don't think this is true. ROCm is a huge advantage for Nvidia but as far as I can tell it is more a set of R&D libraries than anything else, so all the Hot New Stuff keeps being Nvidia first and only (to start with) as the library ecosystem for the hotness doesn't exist yet. Then eventually new libraries are created that are CUDA independent and AMD turns out to make pretty good graphics cards.
I wouldn't be surprised of ROCm withered on the vine and AMD still does fine.

by hurricanepootis

0 subcomment

I've been using ROCm on my Radeon RX 6800 and my Ryzen AI 7 350 systems. I've only used it for GPU-accelerated rendering in Cycles, but I am glad that AMD has an option that isn't OpenCL now.

by naasking

1 subcomments

ROCm is so annoying (buggy, fiddly dependencies, limited hardware support) that TinyGrad built its own compiler and toolchain that targets the hardware directly. And it has broader device support than ROCm, which primarily seems focused on their datacenter GPUs.

by ycui1986

1 subcomments

For many LLM load, it seems ROCm is slower than vulkan. What’s the point?

0 subcomment

by blovescoffee

4 subcomments

Naive question, could agents help speed up building code for ROCm parity with CUDA? Outside of code, what are the bottlenecks for reaching parity?

by amelius

1 subcomments

How long until we can use AI to simply translate all the CUDA stuff to another (more open) platform? I'm getting the feeling we're getting close.
AI won't be working in nVidia's advantage this time.

by DeathArrow

1 subcomments

Do we get better perf or tokens per second with AMD and its software stack than with Nvidia?

by formerly_proven

1 subcomments

We’ve been talking about this for a good ten years at least and AMD is still essentially in the “concepts of a plan” phase. The AMD GPGPU software org has to be one of the most inconsequential ones at this rate.

by xyzsparetimexyz

0 subcomment

Better title: One Dispatch After Another

by shmerl

4 subcomments

Side question, but why not advance something like Rust GPU instead as a general approach to GPU programming? https://github.com/Rust-GPU/rust-gpu/
From all the existing examples, it really looks the most interesting.
I.e. what I'm surprised about is lack of backing for it from someone like AMD. It doesn't have to immediately replace ROCm, but AMD would benefit from it advancing and replacing the likes of CUDA.

by neuroelectron

0 subcomment

Now that the AI bubble is starting to burst, it's a great time for AMD to reveal their AI ambitions. They've set the tone by hiring low cost, outsourced labor.
Of course everybody knows what's really going on here. It's not an open discussion, however.

by techpulselab

0 subcomment

[dead]

by xkbear89

0 subcomment

[flagged]

by cameolkc

0 subcomment

[dead]

by emilyhudson

0 subcomment

[dead]

by alecco

4 subcomments

Apple got it right with unified memory with wide bus. That's why Mac Minis are flying for local models. But they are 10x less powerful in AI TOPS. And you can't upgrade the memory.
I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.
ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh

by nnevatie

3 subcomments

Why is it called "ROCm” (with the strange capitalization) in the first place? This may sound silly, but in order to compete, every detail matters, including the name.