FRESH

Hacker News

Both Codex and Claude got worse this week. Across every plan I retested

7 points by wonderwhyer

by wonderwhyer

1 subcomments

by jdw64

1 subcomments

AI always seems to perform best on the first day after release, and then its performance gradually declines.
Is the AI itself degrading? Or is it because of product-policy changes, such as system prompt modifications and usage limits? Or is it both?
I sometimes wonder whether degradation is simply an inherent property of LLMs themselves.

by derbOac

1 subcomments

"Quality metrics" need much more discussion and attention, in my opinion.
Not a criticism of this project — it's a good idea, it just highlights the central question of "how well is this model working?" I'm not sure it's so straightforward.

by saidnooneever

1 subcomments