FRESH

Hacker News

AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds

42 points by Cynddl

by simianwords

4 subcomments

“ When researchers tested the same performance on a new set of benchmark questions, they noticed that models experienced “significant performance drops.””
This is very misleading because the generalisation ability of LLMs is very very high. It doesn’t just memorise problems - that’s nonsense.
At high school level maths you genuinely can’t get gpt-5 thinking to make a single mistake. Not possible at all. Unless you give some convoluted ambiguous prompt that no human can understand. If you assume I’m correct, how does gpt memorise then?
In fact even undergraduate level mathematics is quite simple for gpt-5 thinking.
IMO gold was won.. by what? Memorising solutions?
I challenge people to find ONE example that gpt-5 thinking gets wrong in high school or undergrad level maths. I could not achieve it. You must allow all tools though.

by lispisok

0 subcomment

There is way too much money being thrown at AI to not game/cheat the benchmarks

by vivzkestrel

2 subcomments

I am amazed not a single pro AI person on HN has anything to say or even speculate about this. This is such a serious issue

by Khaine

0 subcomment

I'm shocked, shocked, that AI is optimised to pass bogus benchmarks.
Just like how GPUs were optimised to pass synthetic benchmarks.