FRESH

Hacker News

Cohere's First Model for Developers

138 points by hmokiguess

by amunozo

1 subcomments

Are these models trained from scratch or do they necessarily need distillation from bigger models to be competitive? It's usually the case that they're a small model for a family with a bigger model. In the first case, does anybody know what's the economy of training this 30B-A3B model vs. training a DeepSeek V4 Pro or Flash size of models (1.6T, 200 something B, less activated)?

by matt_daemon

2 subcomments

> Hardware (minimum): 1× H100 @ FP8
Cool to see this but seems like it would be pretty expensive to run

by montroser

0 subcomment

Well, this is certainly not benchmaxxed, I'll give it that. And props for being honest about how far behind Qwen 3.6 MoE is this model.
But yeah, it's not the best look to have to stretch and say it's "competitive" with other models in it's weight class, when it offers not much else that's useful or novel.

by moojacob

1 subcomments

I was a fan of coheres general purpose LLM. Command A I think? Before they came out with their reasoning model.
More competition is better.

by tonyrice

1 subcomments

by AbuAssar

1 subcomments

strange, I already submitted the same url 6 days ago:
https://news.ycombinator.com/item?id=48475095

by zuzululu

3 subcomments

Wasn't aware that Cohere was still around but this release doesn't exactly instill confidence.

by chattermate

0 subcomment

by moralestapia

1 subcomments

by cyanydeez

2 subcomments