- Are these models trained from scratch or do they necessarily need distillation from bigger models to be competitive? It's usually the case that they're a small model for a family with a bigger model. In the first case, does anybody know what's the economy of training this 30B-A3B model vs. training a DeepSeek V4 Pro or Flash size of models (1.6T, 200 something B, less activated)?
by matt_daemon
2 subcomments
- > Hardware (minimum): 1× H100 @ FP8
Cool to see this but seems like it would be pretty expensive to run
- Well, this is certainly not benchmaxxed, I'll give it that. And props for being honest about how far behind Qwen 3.6 MoE is this model.
But yeah, it's not the best look to have to stretch and say it's "competitive" with other models in it's weight class, when it offers not much else that's useful or novel.
- I was a fan of coheres general purpose LLM. Command A I think? Before they came out with their reasoning model.
More competition is better.
- I'm excited to see more OSS models
- strange, I already submitted the same url 6 days ago:
https://news.ycombinator.com/item?id=48475095
- Wasn't aware that Cohere was still around but this release doesn't exactly instill confidence.
by chattermate
0 subcomment
- [flagged]
by moralestapia
1 subcomments
- [flagged]
by cyanydeez
2 subcomments
- looks like it's just qwen 3.6 coder.