Conclusion
In this work, we introduced and empirically validated the LLM Brain Rot Hypothesis, demonstrating that continual exposure to junk data—defined as engaging (fragmentary and popular) or semantically low-quality (sensationalist) content—induces systematic cognitive decline in large language models. The decline includes worse reasoning, poorer long-context understanding, diminished ethical norms, and emergent socially undesirable personalities.
Fine-grained analysis shows that the damage is multifaceted in changing the reasoning patterns and is persistent against large-scale post-hoc tuning. These results call for a re-examination of current data collection from the Internet and continual pre-training practices. As LLMs scale and ingest ever-larger corpora of web data, careful curation and quality control will be essential to prevent cumulative harms.
Last edited by @suen 2025-10-22T15:04:16Z