- “Studying “Brain Rot” for LLMs isn’t just a catchy metaphor—it reframes data curation as cognitive hygiene for AI, guiding how we source, filter, and maintain training corpora so deployed systems stay sharp, reliable, and aligned over time.”
An LLM-written line if I’ve ever seen one. Looks like the authors have their own brainrot to contend with.
- I encourage everyone with even a slight interest in the subject to download a random sample of Common Crawl (the chunks are ~100MB) and see for yourself what is being used for training data.
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-38/segm...
I spotted here a large number of things that it would be unwise to repeat here. But I assume the data cleaning process removes such content before pretraining? ;)
Although I have to wonder. I played with some of the base/text Llama models, and got very disturbing output from them. So there's not that much cleaning going on.
by Version467
3 subcomments
- So they trained LLM's on a bunch of junk and then notice that it got worse? I don't understand how that's a surprising, or even interesting result?
- > as cognitive hygiene
LLMs are not cognizant. It's a terrible metaphor. It hides the source of the issue. The providers cheaped out on sourcing their data and now their LLMs are filled with false garbage and copyrighted material.
- The two big problems listed:
* Thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth.
* Popularity as a better indicator: the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1.
That's what you'd expect. Popular culture content tends to jump from premise to conclusion without showing the work. Train on popular culture and you get that.
Really, what's supposed to come from training on the Twitter firehose? (Can you still buy that feed? Probably not.) This is a surprise-free result.
At least have a curated model (no social media) and a junk model to compare.
- Brain rot texts seems reasonably harmful, but brain rot videos are often surreal and semantically dense in a way that probably improves performance (such as discussed on this German brain rot analysis https://www.youtube.com/watch?v=-mJENuEN_rs&t=37s). For example, Švankmajer is basically proto-brainrot, but is also the sort of thing you'd watch in a museum and think about.
Basically, I think the brain rot aspect might be a bit of terminology distraction here, when it seems what they're measuring is whether it's a puff piece or dense.
by pixelmelt
6 subcomments
- Isn't this just garbage in garbage out with an attention grabbing title?
- Speaking of, someone shared with me an AI financial challenge where they are pitting the LLMs against one another to make trades, manage risk, etc. and then track their performance against one another with starting capital of 10,000 USD each.
Their starting portfolios are ludicrous. They are trading BTC, XRP, DOGE, etc. I thought the idea was somewhat interesting, but then I felt like the only reasonable takeaway I had was that these models have intense brainrot from consuming twitter, reddit, etc. and as such have a completely warped view of "finance".
- This paper makes me wonder the long lasting effects of the current media consumption patterns by the alpha-gen kids.
by thelastgallon
1 subcomments
- If most of the content produced by younger generations is about skibidi toilet[1] and 67[2], isn't that what LLMs are going to be trained on?
[1] https://en.wikipedia.org/wiki/Skibidi_Toilet
[2] https://en.wikipedia.org/wiki/6-7_(meme)
- After reading this, I just felt like everyone already knows the data is a mess, but no one really cares. We feed the models a bunch of junk, then act surprised when they start getting dumber. Honestly, did we even need a study to figure that out?
- Not surprising that trending tweets as data is junk, not only from brainrots-be-brainrots perspective: trending tweets are contextual. They don't make sense without the rest of the timeline.
And now I know why bots on Twitter don't even work, even with humans in it - they're shooting blind.
- "Trivial or Unchallenging Content" (points to Twitter). I love it.
by commandlinefan
3 subcomments
- My son just sent me an instagram reel that explained how cats work internally, but it was a joke, showing the "purr center" and "knocking things off tables" organ. It was presented completely seriously in a way that any human would realize was just supposed to be funny. My first thought was that some LLM is training on this video right now.
by killshotroxs
0 subcomment
- If only I got money every time my LLM kept looping answers and telling stuff I didn't even need. Just recently, I was stuck with LLM answers, all while it wouldn't even detect simple syntax errors...
- > "brain rot", "Thought-skipping", "primary lesion", "Cognitive Declines", ...
In general using these medical/biological metaphors doesn't seem like a good idea in things like computer science research papers and similar.
Their use forces many inaccurate comparisons (when compared in detail) and they engender human qualities to what are already forgotten to be just computer models. I get this may be done with a slight tongue-in-cheek but with research papers there is also the risk that these terms start to be adopted. And undoing that would be a much taller order in either the research community or general media.
Maybe I am just yelling at clouds.
- It's like showing modern children's TV to kids.
by conception
0 subcomment
- This is a potential moat for the big early players in a pre-atomic steal sort of way as any future players won’t have a non-AI-slop/dead internet to train new models on.
- Naive question: Whats new about the finding that data quality matters when training an LLM?
by donkeylazy456
0 subcomment
- can't wait LLM says "tung x9 sahur" without any context.
by alexchantavy
1 subcomments
- Off topic completely but I really like the font used in this blog
by earth2mars
0 subcomment
- duh! isn't that obvious. is this some students wanted a project with pretty graphs on writing experience?! I am not trying to be cynical or anything. just questioning the obvious thing here.
- So LLMs are indeed human like
- LLMs are brain rot
- interesting, since trivial or unchallenging online content rots actual brains too!
by AznHisoka
4 subcomments
- Can someone explain this in laymen terms?
by CaptainOfCoit
0 subcomment
- > continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs).
TLDR: If your data set is junk, your trained model/weights will probably be junk too.
by loloquwowndueo
0 subcomment
- Tralalero tralala
- Is it just me or has GPT5 turned into a bit of a donkey?
- " Studying “Brain Rot” for LLMs isn’t just a catchy metaphor—it reframes data curation as cognitive hygiene for AI, guiding how we source, filter, and maintain training corpora so deployed systems stay sharp, reliable, and aligned over time."
Is this slop?
- making a model worse is very easy.
- I don't understand why people have a hard time understanding 'garbage in, garbage out'. If you train your model on junk, then you will have a junk model.
by nakamoto_damacy
0 subcomment
- Our metaphorical / analogical muscle is too well developed. Maybe there is a drug we can take to reduce how much we lean into it.
If you look at two random patterns of characters and both contain 6s you could say they are similar (because you’re ignoring that the similarity is less than 0.01%). That’s how comparing LLMs to brains feels like. Like roller skates to a cruise ship. They both let you get around.
by antegamisou
1 subcomments
- My Goodness, looks like Computer 'Science' is a complete euphemism now.
- Another analogy to help us understand that LLMs are a useful part of what people do but are wildly misconstrued as the whole story
- spoken as a true LLM
by moffkalast
0 subcomment
- Ah yes, something the local LLM fine tuning community figured out how to do in creative ways as soon as llama 1 released. I'm glad it has a name.
- [dead]
by zmailnmail786
0 subcomment
- [dead]
- [dead]
- [dead]
by buellerbueller
0 subcomment
- By all means, let's make sure the LLMs have healthier media diets than the humans. We wouldn't want the humans to realize they are being dumbed down into cattle. /s
- AIs need supervision, just like regular people... /s
by chuckreynolds
0 subcomment
- is that why chatGPT always tells me "6 7 lol"? ;)
- "Brain rot" is just the new term for "slang that old people don't understand".
"Cool" and "for real" are no different than "rizz" and "no cap". You spoke "brain rot" once, and "cringed" when your parents didn't understand. The cycle repeats.