FRESH

Hacker News

Mamba-3

257 points by matt_d

by nl

3 subcomments

I'm looking forward to comparing this to Inception 2 (the text diffusion model) which in my experience is very fast and reasonably high quality.

by roger_

0 subcomment

Can anyone explain why Mamba models start with a continuous time SSM (and discretize) vs discrete time?
I know the step isn’t fixed, also not sure why that’s important. Is that the only reason? There also seems to be a parameterization advantage too with the continuous formulation.

by Havoc

5 subcomments

Is there a reason we don’t switch halfway through? ie start with a classic LLM and switch to something linear like mamba as context grows

by jychang

5 subcomments

I'm not sure that I buy their conclusion that more compute during inference is good.
Yes, batch=1 inference is mostly memory bandwidth bound, not GPU compute bound. But no provider does batch=1 inference. Everyone groups all the requests into a batch, and the GPU computes them together.
With a fused kernel, that means the GPU streams the tensors from VRAM, and does a bunch of compute on different conversations in the batch, at the same time.
If they increase the amount of compute required per token, that just reduces the maximum batch size a GPU can handle. In practice, yes this does mean each GPU can serve less users. Providers aren't leaving GPU cores idle normally during inference.

by fudged71

1 subcomments

This is really promising. Are they now going to scale this up to hundreds of billions of parameters? Why stop at 1.5B if they found a potentially SOTA architecture?

by jeffhwang

0 subcomment

I'm glad I clicked through bc I thought the article was about Mamba, the package manager I associate with Python (similar to conda).
https://github.com/mamba-org/mamba

by anentropic

0 subcomment

More here https://news.ycombinator.com/item?id=47423208
https://arxiv.org/abs/2603.15569

by manlymuppet

0 subcomment

by diablevv

0 subcomment

by daliliu

0 subcomment

by robofanatic

7 subcomments

> Mamba-3 is a new state space model (SSM) designed with inference efficiency as the primary goal — a departure from Mamba-2, which optimized for training speed. The key upgrades are a more expressive recurrence formula, complex-valued state tracking, and a MIMO (multi-input, multi-output) variant that boosts accuracy without slowing down decoding.
Why can’t they simply say -
Mamba-3 focuses on being faster and more efficient when making predictions, rather than just being fast to train like Mamba-2.