Also interesting to look beyond children’s work at the incredible amount of “book mill” content that has dominated publishing for hundreds of years.
We rightly celebrate the good ones, but most content before AI was not good. So not surprising that the AI trained on that corpus is of similar quality.
That being said, I think this is just an opportunity to improve AI content, which is a human/computer interface design challenge. This technology is here to stay. Our focus should be on detection and improving the LLMs.
I’ve had luck formalizing this into some post-LLM rules to clean crappy default AI content before I work with it: https://slopwash.com
It’s just a total mess out there.