Okay, that's not fair. There's a big advantage to having an external compressor and reference file whose bytes aren't counted, whether or not your compressor models knowledge.
More importantly, even with that advantage it only wins on the much smaller enwiki8. It loses pretty badly on enwiki9.
Every systems engineer at some point in their journey yearns to write a filesystem
It reminds me of a friend who had a TRS-80 color computer (like me) in the 1980s who was a self-taught BASIC programmer who developed a very complex BBS system and was frustrated that the cluster size for the RS-DOS file system was half a track so there was a lot of space wasted when you stored small files. He called me up one day and told me he'd managed to store 180k of files on a 157k disc and I had to break it to him that he was storing 150k (minus metadata) files on a 157k disk as opposed to the 125k or so he was getting before... With BASIC!"Of course, in the short term, there’s a whole host of caveats: you need an LLM, likely a GPU, all your data is in the context window (which we know scales poorly), and this only works on text data."