I then iterate on that plan.md with the AI until it's what I want. I then ask it to make a detailed todo list from the plan.md and attach it to the end of plan.md.
Once I'm fully satisfied, I tell it to execute the todo list at the end of the plan.md, and don't do anything else, don't ask me any questions, and work until it's complete.
I then commit the project.md and plan.md along with the code.
So my back and forth on getting the plan.md correct isn't in the logs, but that is much like intermediate commits before a merge/squash. The plan.md is basically the artifact an AI or another engineer can use to figure out what happened and repeat the process.
The main reason I do this is so that when the models get a lot better in a year, I can go back and ask them to modify plan.md based on project.md and the existing code, on the assumption it might find it's own mistakes.
I do think there's more value in ensuring that the initial spec, or the "first prompt" (which IME is usually much bigger and tries to get 80% of the way there) is stored. And, maybe part of the product is an LLM summary of that spec, the changes we made to the spec within the session, and a summary of what is built. But... that could be the commit message? Or just in a markdown file. Or in Notion or whatever.
The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,
(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and
(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.
At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].
I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.
But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.
So, community: what should we do?
[1] this point came from seldrige at https://news.ycombinator.com/item?id=47096903 and https://news.ycombinator.com/item?id=47108653.
YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.
[2] Is Show HN dead? No, but it's drowning - https://news.ycombinator.com/item?id=47045804 - Feb 2026 (422 comments)
If you think you should squash commits, then you're only really interested in the final code change. The history of how the dev got there can go in the bin.
If you don't think you should squash commits then you're interested in being able to look back at the journey that got the dev to the final code change.
Both approaches are valid for different reasons but they're a source of long and furious debate on every team I've been on. Whether or not you should be keeping a history of your AI sessions alongside the code could be useful for debugging (less code debugging, more thought process debugging) but the 'prefer squash' developers usually prefer to look the existing code rather than the history of changes to steer it back on course, so why would they start looking at AI sessions if they don't look at commits?
All that said, your AI's memory could easily be stored and managed somewhere separately to the repo history, and in a way that makes it more easily accessible to the LLM you choose, so probably not.
What actually helps is a good commit message explaining the intent. If an AI wrote the code, the interesting part isn't the transcript, it's why you asked for it and what constraints you gave it. A one-paragraph description of the goal and approach is worth more than a 200-message session log.
I think the real question isn't about storing sessions, it's about whether we're writing worse commit messages because we assume the AI context is "somewhere."
Otherwise, when fixing a bug, you just risk starting from scratch and wasting time using the same prompts and/or assumptions that led to the issue in the first place.
Much of the reason code review was/is worth the time is because it can teach people to improve, and prevent future mistakes. Code review is not really about "correctness", beyond basic issues, because subtle logic errors are in general very hard to spot; that is covered by testing (or, unfortunately, deployment surprises).
With AI, at least as it is currently implemented, there is no learning, as such, so this removes much of the value of code review. But, if the goal is to prevent future mistakes, having some info about the prompts that led to the code at least brings some value back to the review process.
EDIT: Also, from a business standpoint, you still need to select for competent/incompetent prompters/AI users. It is hard to do so when you have no evidence of what the session looked like. Also, how can you teach juniors to improve their vibe-coding if you can't see anything about their sessions?
Those UML/use-case/constraint artifacts aren’t committed as session logs per se, but they are part of the author’s intent and reasoning that gets committed alongside the resulting code. That gives future reviewers the why as well as the what, which is far more useful than a raw AI session transcript.
Stepping back, this feels like a decent and dignified position for a programmer in 2026: humans retain architectural judgement --> AI accelerates boilerplate and edge implementation --> version history still reflects intent and accountability rather than chat transcripts. I can’t afford to let go of the productivity gains that flow from using AI as part of a disciplined engineering process, but I also don’t think commit logs should become a dumping ground for unfiltered conversation history.
If by AI you mean non-supervised, autonomous conscience (as I believe the term has to be reserved for), then the answer is again no, as it's as responsible for the quality of its PRs as humans.
If the thing writing code is the former, but there's no human or responsible representative of the latter in the loop, then the code shouldn't be even suggested for consideration in a project where any people do participate. In such case there's no point in storing any additional information as the code itself doesn't have any value (besides electricity wasted to create it) and can be substituted on demand.
Commit comments are generally underused, though, as a result of how forges work, but that's another discussion.
Make a button that does X when clicked.
Agent makes the button.
I tell it to make the button red.
Agent makes it red.
I test it, it is missing an edge case. I tell it to fix it.
It fixes it.
I don't like where the button is. I tell it to put it in the sidebar.
It does that.
I can go on and on. But we don't need to know all those intermediaries. We just need to know Red button that does X by Y mechanism is in the sidebar. Tests that include edge cases here. All tests passing. 2026-03-01
And that document is persisted.
If later, the button gets deleted or moved again or something, we can instruct the agent to say why. Button deleted because not used and was noisy. 2026-03-02
This can be made trivial via skills, but I find it a good way to understand a bit more deeply than commit messages would allow me to do.
Of course, we can also just write (or instruct agents to write) better PRs but AFAICT there's no easy way to know that the button came about or was deleted by which PR unless you spelunk in git blame.
Not once have I found it useful: if the intention isn't clear from the code and/or concise docs, the code is bad and needs to be polished.
Well written code written with intention is instantly interpretable with an LLM. Sending the developer or LLM down a rabbit hole of drafts is a waste of cognition and context.
This is the breakdown of my process - I use tons of .md files serving as a shared brain between Claude and me:
- CLAUDE.md is in the root of the repo, and it's the foundation - it describes the project vision, structure, features, architecture decisions, tech, and others. It then goes even more granular and talks about file sizes, method sizes, problem-solving methodologies (do not reinvent the wheel if a well-known library is already out there), coding practices, constraints, and other aspects like instructions for integration tests. It's basically the manual for the project vision and plan, and also for code writing. Claude reads it every session.
- Every feature has its own .md file, which is maintained. That file describes implementation details, decisions, challenges, and anything that is relevant when starting to code on the feature, and also when it's picked up by a new session.
- At a higher level, above features, I create pairs of roadmap.md and handoff.md. Those pairs are the crucial part of my process. They cover wider modules (e.g., licensing + payments + emailing features) and serve as a bridge between sessions. Roadmap.md is basically a huge checklist, based on CLAUDE.md and features .md docs, and is maintained. The handoff.md contains the current state, session notes, and knowledge. A session would start by getting up to speed with Claude.md and the specific roadmap.md + handoff.md that you plan to work on now and would end by updating the handoff, roadmap, and the impacted features.
This structure greatly helps preserve crucial context and also makes it very easy to use multi-agent.
Of course the commits and PRs are also very descriptive, however the engine is in the .md files.
1. Writing a spec with clear acceptance criteria.
2. Assigning IDs to my acceptance criteria. Sounds tedious, but actually the idea wasn’t mine, at some point an agent went and did it without me asking. The references proved so useful for guiding my review that I formalized the process (and switched from .md to .yaml to make it easier).
3. Giving my agents a source of truth to share implementation progress so they can plan their own tasks and more effectively review.
Of course, I can’t help myself, I had to formalize it into a spec standard and a toolkit. Gonna open source it all soon, but I really want feedback before I go too far down the rabbit hole:
Not insisting upon this, would be similar to depending on a SaaS to compile and packages software, and being totally cool with it. Both LLMs and build systems, convert human-friendly notation into machine-friendly notation. We should hold the LLM companies to the same standards of transparency that we hold the people who make things like nix, clang, llvm, cmake, cargo, etc.
What are they even supposed to do with feedback on the code? It has to be translated by my teammate into the language of the work they did, which is the conversation they had with the AI agent.
But the conversation isn't the "real work": the decisions made in the conversation are the real work. That is what needs capture and review.
So now I know why code reviews are kinda wrong, what can we do to have meaningful reviews of the work my teammates have done?
What I landed on is aiming to capture more and more “work” in the form of a spec, review the spec, ignore the code. this isn't novel or interesting. HOWEVER...
For the large, messy, legacy codebases I work in today, I don’t like the giant spec driven development approach that is most popular today. It’s too risky to solely trust the spec because it touches so much messy code with so many gotchas. However, with the rate of AI generated code rolling in, I simply can’t switch context quickly enough to review it all efficiently. Also, it’s exhausting.
The approach I have been refining is defining very small modules (think a class or meaningful collection of utils) with a spec and a concise set of unit tests, generating code from the spec, then not reading or editing the generated code.
Any changes to the code must be made to the spec, and the code re-generated. This puts the PR conversation in the right place, against the work I have done: which is write the spec.
So far the approach has worked for replacing simple code (eg: a nestjs service that has a handful of public methods, a bit of business logic, and a few API client calls). PRs usually have a handful of lines of glue code to review, but the rest are specs (and a selection of “trust” unit tests) and the idea is that the code can be skipped.
AI review bots still review the PR and comment around code quality and potential security concerns, which I then translate into updates to the spec.
I find this to be a good step towards the codegen future without totally handing over my (very messy and not very agent friendly) codebases.
> [...]
> All contributors must indicate in the commit message of their contribution if they used AI to create them and the contributor is fully responsible for the content that they submit.
> This can be a label such as `Assisted By: <Tool>` or `Generated by: <Tool>` based on what was used. This label should be representative of the contribution and how it was created for full transparency. The commit message must also be clear about how it is solving a problem/making an improvement if it is not immediately obvious.*
From "Entire: Open-source tool that pairs agent context to Git commits" (2026) https://news.ycombinator.com/item?id=46964096 :
> But which metadata is better stored in git notes than in a commit message? JSON-LD can be integrated with JSON-LD SBOM metadata
I bet, without trying to be snarky, that most AI users don't even know you can commit with an editor instead of -m "message" and write more detail.
It's good that AI fans are finding out that commits are important, now don't reinvent the wheel and just spend a couple minutes writing each commit message. You'll thank yourself later.
For example: https://github.com/kzahel/PearSync/blob/main/sessions/sessio...
I think it's valuable to share that so people who are interested can see how you interact with agents. Sharing raw JSONL is probably a waste and contains too many absolute paths and potential for sharing unintentionally.
https://github.com/peteromallet/dataclaw?tab=readme-ov-file#... is one project I saw that makes an attempt to remove PII/secrets. But I certainly wouldn't share all my sessions right now, I just don't know what secrets accidentally got in them.
You can document the prompt chain, the plan, the design doc. But if nobody outside the team ever touches it before it ships, you are still flying blind on whether the thing actually works for a human who encounters it cold. The AI session log tells you what was intended. It does not tell you what was understood.
Now whenever I need to reason with what agent did & why, info is linked & ready on demand. If needed, session is also saved.
It helps a lot.
A coding session has a lot of 'left turn, dead end, backtrack' noise that buries the decision that actually mattered. Committing the full session is like committing compiler output — technically complete, practically unreadable.
We've been experimenting with structured post-task reflections instead: after completing significant work, capture what you tried, what failed, what you'd do differently, and the actual decision reasoning. A few hundred tokens instead of tens of thousands. Commits with a reflection pointer rather than an embedded session.
The result is more useful than raw logs. Future engineers (or future AI sessions) can understand intent without replaying the whole conversation. It's closer to how good commit messages work — not 'here's what changed' but 'here's why'.
Dang's point about there being no single session is also real. Our biggest tasks span multiple sessions and multiple contributors. 'Capture the session' doesn't compose. 'Capture the decision' does.
Commits, branches, and the entire model works really well for human-to-human collaboration, but it starts to be too much for agent-to-human interactions.
Sharing the entire session, in a human, readble way, offering a rich experiences to other humans to understand, is way better then having git annotations.
That's why we built https://github.com/wunderlabs-dev/claudebin.com. A free and open-source Claude Code session sharing tool, which allows other humans to better understand decisions.
Those sessions can be shared in PR https://github.com/vtemian/blog.vtemian.com/pull/21, embedded https://blog.vtemian.com/post/vibe-infer/ or just shared with other humans.
https://github.com/eqtylab/y just a prototype, built at codex hackathon
The barrier for entry is just including the complete sessions. It gets a little nuanced because of the sheer size and workflows around squash merging and what not, and deciding where you actually want to store the sessions. For instance, get notes is intuitive; however, there are complexities around it. Less elegant approach is just to take all sessions in separate branches.
Beyond this, you could have agents summarize an intuitive data structure as to why certain commits exist and how the code arrived there. I think this would be a general utility for human and AI code reviewers alike. That is what we built. Cost /utility need to make sense. Research needs to determine if this is all actually better than proper comments in code
This applies both to future AI tools and also experts, and experts instructing novices.
To some degree, the lack of documenting AI sessions is also at the core of much of the skepticism toward the value of AI coding in general: there are so many claims of successes / failures, but only a vanishingly small amount of actual detailed receipts.
Automating the documentation of some aspects of the sessions (skills + prompts, at least) is something both AI skeptics and proponents ought to be able to agree on.
EDIT: Heck, if you also automate documenting the time spent prompting and waiting for answers and/or code-gen, this would also go a long way to providing really concrete evidence for / against the various claims of productivity gains.
Conversations may also be very non-linear. You can take a path attempting something, roll back to a fork in the conversation and take a different path using what you have learned from the models output. I think trying to interpret someone else's branching flow would be more likely to create an inaccurate impression than understanding.
That's what architectural decision records (ADRs) are designed to capture, and it's where the workflow naturally lands. Not committing the full transcript, but having the agent synthesize a brief ADR at the close of each session: here's what was attempted, what was discarded and why, what the resulting code assumes. Future maintainers — human or AI — need exactly that, and it's compact enough that git handles it fine.
One thing I've added on top of the plan/project structure: a short `decisions.md` that logs only the non-obvious choices, like "tried X, it caused Y issue, went with Z instead". Basically the things that would make future-me or a future agent waste time rediscovering.
Do you find the plan.md files stay useful past the initial build, or do they mostly just serve as a commit artifact?
If I chat with an agent and give an initial prompt, and it gets "aspect A" (some arbitrary aspect of the expected code) wrong, I'll iterate to get "aspect A" corrected. Other aspects of the output may have exactly matched my (potentially unstated) expectation.
If I feed the initial prompt into the agent at some later date, should I expect exactly "aspect A" to be incorrect again? It seems more likely the result will be different, maybe with some other aspects being "unexpected". Maybe these new problems weren't even discussed in the initial archived chat log, since at that time they happened to be generated in a way in alignment with the original engineers expectation.
Maybe not a permanent part of the commit, but something stored on the side for a few weeks at a time. Or even permanently, it could be useful to go back and ask, "why did you do it that way?", and realize that the reason is no longer relevant and you can simplify the design without worrying you're breaking something.
However there is an unpleasant reality: the system could be incredibly brittle, with the slightest change in input or seed resulting in significantly different output. It would be nice if all small and seemingly inconsequential input perturbations resulted in a cluster of outputs that are more or less the same, but that seems very model dependent.
git is only one possible location.
I think there is very valuable information in session logs, like the prompts, or the usage statistics at the end of the session, which model was used etc. But git history or the commit messages should focus on the outcome of the work, not on the process itself. This is why the whole issue discussion before work in git starts is also typically kept separately in tickets. Not in git itself, but close to it.
There're platforms like tulpal.com which move the whole local agent-supported process to the server and therefore have much better after-the-fact observability in what happened.
You'll find that at least half of it is noise.
If you put that in commits, you lose the ability to add "study git commits to ground yourself" in your agents.md or prompts. Because now you'll have 50%+ noise in your active session's context window.
Context window is precious. Guard it however you can.
If you do proper software development (planing, spec, task breakdown, test case spec, implementation, unit test, acceptance test, ...) implementation is just a single step and the generated artifact is the source code. And that's what needs to be checked in. All the other artifacts are usually stored elsewhere.
If you do spec and planing with AI, you should also commit the outcome and maybe also the prompt and session (like a meeting note on a spec meeting). But it's a different artifact then.
But if you skip all the steps and put your idea directly to an coding agent in the hope that the result is a final, tested and production ready software, you should absolutely commit the whole chat session (or at least make the AI create a summary of it).
Soon only implementation details will matter. Code can be generated based on those specifications again and again.
Back in the dark ages, you'd "cc -s hello.c" to check the assembler source. With time we stopped doing that and hello.c became the originating artefact. On the same basis the session becomes the originating artefact.
For my AI coding sessions I just point opencode to the issue. It does a plan, (build, ie) implements and tests the plan, and commits it. For reference you always have the issue, revise the issue when something changed.
We always worked like this, recording the thinking and planning part is silly. You can always save your session data.
I honestly don't know if I'm doing something very wrong or if I have a very different working style than many people, but for me "just give the prompt/session" isn't a possibility because there isn't one.
I'm probably incredibly inefficient, because even when I don't use AI it is the same, a single commit is usually many different working states / ideas / branches of things I tried and explored that have been amended / squashed.
First, I tried using simple inline comments, but the agents happily (and silently) removed them, even when prompted not to.
The next attempt was to have a parallel markdown file for every code file. This worked OK, but suffered from a few issues:
1. Understanding context beyond the current session
2. Tracking related files/invocations
3. Cold start problem on an existing codebases
To solve 1 and 3, I built a simple "doc agent" that does a poor man's tree traversal of the codebase, noting any unknowns/TODOs, and running until "done."
To solve 2, I explored using the AST directly, but this made the human aspect of the codebase even less pronounced (not to mention a variety of complex edge-cases), and I found the "doc agent" approach good enough for outlining related files/uses.
To improve the "doc agent" cold start flow, I also added a folder level spec/markdown file, which in retrospect seems obvious.
The main benefit of this system, is that when the agent is working, it not only has to change the source code, but it has to reckon with the explanation/rationale behind said source code. I haven't done any rigorous testing, but in my anecdotal experience, the models make fewer mistakes and cause less regressions overall.
I'm currently toying around with a more formal way to mark something as a human decision vs. an agent decision (i.e. this is very important vs. this was just the path of least resistance), however the current approach seems to work well enough.
If anyone is curious what this looks like, I ran the cold start on OpenAI's Codex repo[0].
[0]https://github.com/jumploops/codex/blob/file-specs/codex-rs/...
So I like the link's approach quite a bit.
The paradigm shift, which is a shift back, is to embrace the fact that you have to slow down, and understand all the code the ai is writing.
That way if I need to find a prompt from some feature from the past, I just find the relevant .md file and it's right at the top.
Interestingly, my projects are way better documented (via prompts) than they ever were in the pre-agentic era.
I'm not sure about becoming part of the repo/project long term but I think providing your prompts as part of the pull request makes the review much easier because the reviewer can quickly understand your _intent_. If your intent has faulty assumptions or if the review disagrees with the intent, that should be addressed first. If the intent looks good, a reviewer can then determine if you (or your coding agent) have actually implemented it first.
I only log my own user messages not AI responses in a chat_log.md file, which is created by user message hook in the repo.
Right now this paradigm is so novel to us that we don’t know if what is being saved is useful in anyway or just hoarding garbage.
There are some who (rightly IMO) just neatly squash their commits and destroy the working branch after merging. There are others who would rather preserve everything.
However, I do think that a higher-level description of every notable feature should be documented, along with the general implementation details. I use this approach for my side projects and it works fairly well.
The biggest question whether it will scale, I suspect that no, and I also suspect it is probably better to include nothing than a poor/disjointed/rare documentation of the sessions.
The entire prompt and process would be fine if my git history was subject to research but really it is a tool for me or anyone else who wants to know what happened at a given time.
Original blogpost goes over motivations + workflow:
1. Using LLMs as a tool but still very much crafting the software "by hand",
2. Just prompting LLMs, not reading or understanding the source code and just running the software to verify the output.
A lot of comments here seem to be thinking of 1. But I'm pretty sure the OP is thinking of 2.
For my work as one of developers in team, no. The way I prompt is my asset and advantage over others in a team who always complain about AI not being able to provide correct solutions and secures my career
It need to be considered as a compiled output of vbc-c, vbc-python, or vbc-ts, or vbc-js.
Keeping the source code (the prompt) is very natural, when compiled binaries “vibecoded” output is lacking _context_ and _motivation_ (which the source code / prompt provides)
Instead, we need better (self-explaining) translation from spec to code. And better tools that help us navigate codebases we've not written ourselves.
For example, imagine a UI where you click on a feature spec file and it highlights you all the relevant tests and code.
That context could clarify the problem, why the solution was chosen, key assumptions, potential risks, and future work.
That said preserved private session records might be of great personal benefit.
If that was important, why are we not already doing things like this. Should I have always been putting my browser history in commits?
Saving sessions is even more pointless without the full context the LLM uses that is hidden from the user. That's too noisy.
POH = Plain Old Human
Easy to achieve.
Why NOT include a link back? Why deprive yourself of information?
pros:
intent is documented
reference to see how it was made
informal documentation
find flaws in your mental model
others can learn from your style
cons:
others can see how it was made
mention things you don't want others to see/know
people can see how dumb we are
reality:
you will judge and be judged for engineering competency not through code, but through words
The whole point of the source code it generates is to have the artifact. Maybe this is somewhat useful if you need to train people how to use AI, but at the end of the day the generated code is the thing that matters. If you keep other notes/documentation from meetings and design sessions, however you keep that is probably where this should go, too?
That is a cynical take and not very different from an advice to never write any documentation, or never help your teammates. Only that resemblance is superficial. In any organization you shouldn't help people stealing you time for their benefit (Sean Goedecke calls them predators https://www.seangoedecke.com/predators/).
On the other hand, it may be beneficial to privately save CLAUDE.md and other parts of persistent context. You may gitignore them (but that will be conspicuous unless you also gitignore .gitignore) or just load them from ~/.claude
I expect an enterprise version of Claude Code that will save any human input to the org servers for later use.
you mean plagiarism?
I understand the drive for stabilizing control and consistency, but this ain't the way.
EOM
Consider:
"I got a bug report from this user:
... bunch of user PII ..."
The LLM will do the right thing with the code, the developer reviewed the code and didn't see any mention of the original user or bug report data.
Now the notes thing they forgot about goes and makes this all public.
Lots of comments mentioned this, for those who aren't aware, please checkout
Git Notes: Git's coolest, most unloved feature (2022)
https://news.ycombinator.com/item?id=44345334
I think it's a perfect match for this case.
the actual problem is that AI produces MORE code not better code, and most people using it aren't reviewing what comes out. if you understood the code well enough to review it properly you wouldn't need the session log. and if you didn't understand it, the session log won't help you either because you'll just see the agent confidently explaining its own mistakes.
> have your agent write a commit message or a documentation file that is polished and intended for consumption
this is the right take. code review and commit messages matter more now than they ever did BECAUSE there's so much more code being generated. adding another artifact nobody reads doesn't fix the underlying issue which is that people skip the "understand what was built" step entirely.
One agent writes task specs. The other implements them. Handoff files bridge the gap. The spec IS the session artifact because it captures intent, scope, and constraints before any code gets written.
The plan.md approach people are describing here is basically what happens naturally when you force yourself to write intent before execution.