- Interesting choice from PyTorch to release yet another DSL, on positive side it's one more point in the design space on the other hand it's even more difficult to choose the right technology among Triton, Gluon, CuTe, ThunderKittens and a few others.
- The developers also gave a talk about Helion on GPU Mode:
https://www.youtube.com/watch?v=1zKvCLuvUYc
- It's good to see more effort for making things not device specific but I only see benchmarks for NVIDIA B200 and AMD MI350X. Also what's the experience of using one of these Python DSLs like? Are the tools good enough to make code completion, jump to definition, setting breakpoints, watching variables, copying as expression etc. nice?
- Asking as someone who is really out of the loop: how much of ML development these days touches these “lower level” parts of the stack? I’d expect that by now most of the work would be high level, and the infra would be mostly commoditized.
by dachworker
3 subcomments
- I'm super excited to give this one a spin. It seems like a neat idea, Triton, but simpler and with automatic autotuning. My head is spinning with options right now. I love how everyone was hyping up CUDA this and CUDA that a couple of years ago, and now CUDA is all but irrelevant. There's now so many different and opinionated takes on how you should write high performant accelerator cluster code. I love it.
It's also kinda of ironic that right now in 2025, we have all this diversity in tooling, but at the same time, the ML architecture space has collapsed entirely and everyone is just using transformers.
- I switched from pytorch to jax just before triton appeared. Does anyone know how jax compares to this autotuning machinery in pytorch ? I know jax does jit, but i don't have a good intuition if jit is better than this type of autotuning.
- Compiling a kernel after assemblage in low-level object oriented languages either uses
stable kernel or the cargo fuzzed raw_spinlock code.
Helion abstracts syntax and design for calculating λ-functions, which converts language in a kernel config.
by mshockwave
1 subcomments
- Is it normal to spend 10minutes on tuning nowadays? Do we need to spend another 10 minutes upon changing the code?
- I dont get the point of helion as compared to its alternatives like gluon.
For best performance I would presume one needs low-level access to hardware knobs. And, these kernel primitives are written one-time and reused. So, what is the point of a DSL that dumbs things down as a wrapper around triton.
by singularity2001
1 subcomments
- Anything as long as I don't have to touch propriety cuda and mpx
- How does this compare against other DSLs?
- I posted this 5 days ago, how did this resurface?
- numba for gpu kernels... cool!
- Tangential question related to the example kernel: in GPU programming is it idiomatic/standard to initialize the out array as zeros rather than empty? are the performance savings negligible?
by doctorpangloss
1 subcomments
- Is contributing to Triton so bad? It looks like the blocker is usually LLVM.