https://www.nature.com/articles/s41586-024-07566-y
If you've spent any time using LLMs to write documentation you'll see this for yourself: the compounding will just be rewriting valid information with less terse information.
I find it concerning Karpathy doesn't see this. But I'm not surprised, because AI maximalists seem to find it really difficult to be... "normal"?
Rule of thumb: if you find yourself needing to broadcast the special LLM sauce you came up with instead of what it helped you produce, ask yourself why.
> Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions. They will think of mechanisms, procedures, and models. They will remember that such-and-such a person did some possibly relevant work on a topic of interest back in 1947, or at any rate shortly after World War II, and they will have an idea in what journals it might have been published. In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.
> In addition, men will handle the very-low-probability situations when such situations do actually arise. (In current man-machine systems, that is one of the human operator's most important functions. The sum of the probabilities of very-low-probability alternatives is often much too large to neglect. ) Men will fill in the gaps, either in the problem solution or in the computer program, when the computer has no mode or routine that is applicable in a particular circumstance.
> The information-processing equipment, for its part, will convert hypotheses into testable models and then test the models against data (which the human operator may designate roughly and identify as relevant when the computer presents them for his approval). The equipment will answer questions. It will simulate the mechanisms and models, carry out the procedures, and display the results to the operator. It will transform data, plot graphs ("cutting the cake" in whatever way the human operator specifies, or in several alternative ways if the human operator is not sure what he wants). The equipment will interpolate, extrapolate, and transform. It will convert static equations or logical statements into dynamic models so the human operator can examine their behavior. In general, it will carry out the routinizable, clerical operations that fill the intervals between decisions.
https://www.organism.earth/library/document/man-computer-sym...
Website : https://coticsy.com/aime.html
iOS: https://apps.apple.com/us/app/aime-ondevice-ai/id6754805828
Android: https://play.google.com/store/apps/details?id=com.coticsy.ll...
On a sidenote, I've been building an AI powered knowledge base (yes, it uses RAG) that has wiki synthesis and similar ideas, take a look at https://github.com/kenforthewin/atomic
> but the LLM is rediscovering knowledge from scratch on every question
Unless the wiki stays fully in context now the LLM hast to re-read the wiki instead of re-reading the source files. Also this will introduce and accumulate subtle errors as we start to regurgitate 2nd-order information.
I totally get the idea but I think next gen models with 10M context and/or 1000tps will make this obsolete.
As such I've taken to delegating substantial parts architecture and discovery to multiagent workflows that always refer back to a wiki-like castle of markdown files that I've built over time with them, fronted by Obsidian so I can peep efficiently often enough.
Now I'm certainly doing something wrong, but the gaps are just too many to count. If anything, this creates a weird new type of tech debt. Almost like a persistent brain gap. I miss thinking harder and I think it would get me out of this one for sure. But the wiki workflow is just too addictive to stop.
https://github.com/asakin/llm-context-base
Main additions on top of the pattern: a training period where the AI learns how you work over 30 days then gets quieter over time, a metadata standard so files are queryable by summary, and a lint pass for stale content and context loading optimization. Never have to design a taxonomy upfront.
The AGENTS.md approach papers over this by teaching the LLM the folder conventions. Works until the data gets complex but gets worse after many iterations.
Both are needed: files that open in any editor, and a structured interface the agent can actually query. Been building toward that with Binder (github.com/mpazik/binder), a local knowledge platform. Data lives in a structured DB but renders to plain markdown with bi-directional sync. LSP gives editors autocomplete and validation. Agents and scripts get the same data through CLI or MCP.
Start with short text context, and flow through DAGs via choose your own adventure. We alreadybreached context limits. Nows the time to let LLMs build their contexts through decision trees and prune dead ends.
Setting aside that their codebase is absolute slopcrap, I think something like this might work nicely if it's built from the ground up.
For my own test environment I'm relying on Golang and its conventions (go build, go test, go fmt, gopls etc) which saves a lot of prompts and tokens down the line. Additionally I think that spec driven development might be more successful but I haven't found out yet what the right amount of details for specifications is, so that semantic anchors can help summarize it better.
Anyways if you're curious, it's made for short agent lifecycles and it kinda works every time most of the time: https://github.com/cookiengineer/exocomp
Still need to implement the summarizing agent and memory parts, it's a little fiddlework to get that right so I'm currently experimenting a lot locally with both ollama and vllm as an inference engine.
I would be interested in trying to make the models go into more of a research mode and organize their knowledge inside it, but I've found this turns into something like LLM soup.
For coding projects, the best experience I have had is clear requirements and a lot of refinement followed through with well documented code and modules. And only a few big 'memories' to keep the overall vision in scope. Once I go beyond that, the impact goes down a lot, and the models seem to make more mistakes than I would expect.
This list is also part of my own contender in this race: https://zby.github.io/commonplace/ - my own LLM operated knowledge base (this is the html rendering of that KB - there is also the github repo linked there).
The main feature is that I use it to build a theory about such systems - and the neat trick is that llms can read this theory and implement it so the very theory works as an LLM runtime too.
It works for me - but it has some rough edges still - so I guess it is not for everyone.
Everything should live in the repo. Code and docs yes. But also the planning files, epics, work items, architectural documentation and decisions. Here is a small example of my Linux system doc: https://github.com/gchamon/archie/tree/main/docs
And you don't need to reinvent the wheel. Code docs can like either right next to it in the readme or in docs/ if it's too big for a single file or the context spams multiple modules. ADRs live in docs/architecture/decisions. Epics and Workitems can also live in the docs.
Everything is for agents and everything is for humans, unless put in AGENTS.md and docs/agents or something similar, and even those are for human too.
In a nutshell, put everything in the repo, reuse standards as much as possible, the idea being it's likely the structure is already embedded in the model, and always review documentation changes.
[0] “Stuffing Context is not Memory, Updating Weights Is": https://www.youtube.com/watch?v=Jty4s9-Jb78
I find it helps a LOT with discovery. Llm spends a lot less time figuring out where things are. It’s essentially “cached discovery”
Check it out: https://github.com/ractive/hyalo
But I like the idea of an LLM generated/maintained wiki. That might be a useful addition to allow for more interactive exploration of a document database.
Doesn't really feel that useful in practice.
I'd rather have it source the original document everytime, then an LLM-generated wiki which I most likely wouldn't have the time to fact-check and review myself.
The problem is that it is still a slop: not only it adds a lot of noise ("architecture" diagrams based on some cherry-picked filenames, incomplete datatables, hyperfocusing on strange things), it also hallucinates, adding factually incorrect information (while direct questions to LLM shows correct information).
I built an implementation of this and tested it on 3 Alex Hormozi books (~155K words, 68 source files). Some data for the skeptics:
The naive version (each book as 1 file) produced exactly the slop people are describing here. But splitting into chapter-level files and recompiling changed the output categorically. Same model, same prompts — the only variable was source granularity.
The compiler produced 210 concept pages with 4,597 cross-references (19.2 avg links per page). 20+ concepts synthesized across all 3 books unprompted — one pulled from 11 source files and found a genuine contradiction between two books that neither makes explicit. 173K words of output from 155K input. It's not compression — it's synthesis.
The thing I think the "this is just RAG" comments are missing: a vector database is only useful to machines. You can't open a .faiss file and browse it. A wiki is useful to both. I open these files in Obsidian, browse the graph, follow links, read concept pages — no AI needed. But when I do ask the AI a question, it reads the same wiki pages I do, and the answers are better than RAG because the knowledge is already structured and cross-referenced instead of retrieved as raw chunks.
That's the key insight in Karpathy's idea. The compiled wiki is the interface for humans AND the knowledge layer for AI. Same artifact, two audiences.
~Cost: 12M tokens, ~10-15 min. Repo: https://github.com/vbarsoum1/llm-wiki-compilerI'm not sure how you can get any closer to "turning your thinking over to machines." These tasks may be "grunt work," but it's while doing these things that new ideas pop in, or you decide on a particular or novel way to organize or frame information. Many of my insights in my (analog? vanilla? my human-written) Obsidian vault (that I consider my "personal wiki") have been made or expanded on because I happened to see one note after another in doing the "grunt work", or just by opening one note and seeing its title right beside a previously forgotten one.
There's nothing "personal" about a knowledge base you filled by asking AI questions. It's the AI's database, you just ask it to write stuff. Learn how to learn and answer your own damn questions.
Soon pedagogy will be a piece of paper that says "Ask AI."
I hate this idea that a result is all that matters, and the quicker you can get the result the better, at any cost (mental or financial, short-term or long-term).
If we optimized showers to be 20 seconds, we'd stop having shower thoughts. I like my shower thoughts. And so too my grunt-work thoughts.
---
As an aside, I'm not totally against AI writing in a personal knowledgebase. I include it at times in my own. But since I started my current obsidian vault in 2023 (now 4100 self-written notes, including maybe up to 5% Web Clipper notes), I've had a Templater (Obsidian plugin) template I wrap around anything AI-written to 'quarantine' it from my own words:
==BEGIN AI-GENERATED CONTENT==
<% tp.file.cursor(1) %>
==END AI-GENERATED CONTENT==
I've used this consistently and it's helped me keep (and develop) my own writing voice apart from any of my AI usage. It actually motivates me to write more, because I know I could always take the easy route and chunk whatever I'm thinking into the AI, but I'm choosing not to by writing it myself, with my own vocabulary, in my own voice, with my own framing. I trick myself into writing because my pride tells me I can express my knowledge better than the AI can.
I also manually copy and paste from wherever I'm using AI into my notes. Nothing automated. The friction keeps me from sliding into the happy path of turning my brain off.
Then what is the point? Why be averse so to use your own brain so much? Why are tech bros like this?