FRESH

Hacker News

Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks

115 points by lucasfcosta

by VoidWhisperer

1 subcomments

https://github.com/nex-agi/Nex-N2/issues/4
Seems that they didn't make/train a new novel model, they did a mix of two existing models and then gave it an instruction to say it was 'Rio, trained by Rio AI Labs'

by mettamage

2 subcomments

https://xcancel.com/ZenMagnets/status/2065796012820848699
Correct me if I'm wrong but reading through the comments of the thread this seems to be post training/fine tuning.

by adrian_b

0 subcomment

> Post-trained from Qwen 3.5 397B
Model Card:
https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B

by Aurornis

2 subcomments

A city government funding a fine-tune of a model is interesting.
As for the benchmarks: If you spend any time playing with fine tunes of published models you know that benchmarks are gamed so much that they're a useless indicator of performance for models from small teams. It's too easy to fine tune a model to perform well on the benchmarks, release it, put a line on your resume saying you released a model that beat the major labs on benchmarks, and then try to use that to jump into a new job. The temptation is high.
There are a lot of fringe models and fine tunes that claim to have better performance on some benchmark. Then you try to use them and find they're often worse at general tasks than the base model.
I would wait and see if these results hold across other benchmarks. It's cool that the city is doing something with AI, but this is something where extraordinary claims require extraordinary evidence. I doubt a small, previously unknown team has unlocked something secret that the team who made Qwen couldn't figure out. It's more likely it was fine tuned for a specific outcome (possibly these benchmarks) and performance in other areas was reduced as a consequence.

by HeliumHydride

0 subcomment

https://www.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_mod... https://x.com/SemiAnalysis_/status/2065894494935933191

0 subcomment

by arjie

1 subcomments

Benchmaxxing is the new “have a crypto trading strategy”. No one is impressed by it except non practitioners.

0 subcomment

by cuzezzzbbfofai

2 subcomments

by hmokiguess

0 subcomment

by ramon156

2 subcomments

Every day I'm reminded why I don't spend time on twitter. What use does it have to claim "X is better than Y in benchmark Z, disagreeing with that means disagreeing with me"
Information is power, dick measurements are not.