T5Gemma 2: The next generation of encoder-decoder models
148 points by milomg
by minimaxir
2 subcomments
> Note: we are not releasing any post-trained / IT checkpoints.
I get not trying to cannibalize Gemma, but that's weird. A 540M multimodel model that performs well on queries would be useful and "just post-train it yourself" is not always an option.
by killerstorm
2 subcomments
They are comparing 1B Gemma to 1+1B T5Gemma 2. Obviously a model with twice more parameters can do more better. Says absolutely nothing about benefits of the architecture.
by potatoman22
2 subcomments
What's the use case of models like T5 compared to decoder-only models like Gemma? More traditional ML/NLP tasks?
by davedx
4 subcomments
What is an encoder-decoder model, is it some kind of LLM, or a subcomponent of an LLM?
by DoctorOetker
0 subcomment
What is the "X" in the pentagonal performance comparison, is it multilingual performance or something else?
by o1inventor
0 subcomment
> 128k context.
don't care. prove effective context length or gtfo.