FRESH

Hacker News

Home

Trinity large: An open 400B sparse MoE model

230 points by linolevan

by mynti

6 subcomments

They trained it in 33 days for ~20m (that includes apparently not only the infrastructure but also the salaries over a 6 month period). And the model is coming close to QWEN and Deepseek. Pretty impressive

by tcdent

0 subcomment

It's super exciting to see another American lab get in the ring. Even if they're not at SOTA on the first release, the fact that they're trying is incredible for open source AI.

by linolevan

1 subcomments

I'm particularly excited to see a "true base" model to do research off of (https://huggingface.co/arcee-ai/Trinity-Large-TrueBase).

by Alifatisk

3 subcomments

What did they do to make the loss drop so much in phase 3?
Also, why are they comparing with Llama 4 Maverick? Wasn’t it a flop?

by mwcampbell

3 subcomments

Given that it's a 400B-parameter model, but it's a sparse MoE model with 13B active parameters per token, would it run well on an NVIDIA DGX Spark with 128 GB of unified RAM, or do you practically need to hold the full model in RAM even with sparse MoE?

by greggh

2 subcomments

The only thing I question is the use of Maverick in their comparison charts. That's like comparing a pile of rocks to an LLM.

by trilogic

0 subcomment

Testing it now in HugstonOne. Running smooth at 5.8 T/S : Loaded Trinity-Large-Preview-UD-Q4_K_XL-00001-of-00005.gguf.
The T/S speed is acceptable, also stable 60 degrees celcius for the gpu temperature. Accuracy and precision in math problems. So far so good. Results: https://www.reddit.com/r/Hugston/comments/1qq9d5i/testing_tr...

by frogperson

1 subcomments

What exactly does "open" mean in this case? Is it weights and data or just weights?

by khimaros

0 subcomment

unsloth quants are up https://huggingface.co/unsloth/Trinity-Large-Preview-GGUF

by LoganDark

1 subcomments

According to the article, nearly 50% of the dataset is synthetic (8T out of 17T tokens). I don't know what constitutes "a breadth of state-of-the-art rephrasing approaches", but I lack some confidence in models trained on LLM output, so I hope it wasn't that.

by fuddle

1 subcomments

> We optimize for performance per parameter and release weights under Apache-2.0
How do they plan to monetize?

by kristianp

0 subcomment

There's a free preview on openrouter: https://openrouter.ai/arcee-ai/trinity-large-preview:free

by observationist

0 subcomment

This is a wonderful release.

by syntaxing

0 subcomment

So refreshing to see open source models like this come from the US. I would love for a 100Bish size one that can compete against OSS-120B and GLM air 4.5

by 0xdeadbeefbabe

1 subcomments

Is anyone excited to do ablative testing on it?