Otherwise the ability to search back through history is a valuable simple git log/diff or (rip)grep/jq combo over the session directory. Simple example of mine: https://github.com/backnotprop/rg_history
Do the authors have any benchmarks or test to show that this genuinely improved outputs?
I have tried probably 10-20 other open source projects and closed source projects purporting to improve Claude Code with memory/context, and still to this date, nothing works better than simply keeping my own library of markdown files for each project specification, markdown files for decisions made etc, and then explicitly telling Claude Code to review x,y,z markdown files.
I would also suggest to the founders, don't found a startup based on improving context for Claude Code, why? Because this is the number 1 thing the Claude Code developers are working on too, and it's clearly getting better and better with every release.
So not only are you competing with like 20+ other startups and 20+ other open-source projects, you are competing with Anthropic too.
My approach is literally just a top-level, local, git version controlled memory system with 3 commands:
- /handoff - End of session, capture into an inbox.md
- /sync - Route inbox.md to custom organised markdown files
- /engineering (or /projects, /tasks, /research) - Load context into next session
I didn't want a database or an MCP server or embeddings or auto-indexing when I can build something frictionless that works with git and markdown.
Repo: https://github.com/ossa-ma/double (just published it publicly but its about the idea imo)
The idea: multiple AIs (Claude, GPT, Gemini, Grok) brainstorm simultaneously and produce one agreed response. This might solve the context problem more elegantly because:
- No token limit anxiety - you get comprehensive answers upfront - Better quality through AI cross-validation - The consensus answer naturally becomes your context - Simpler to implement - just parallel API calls vs memory tree management
Just curious if you've explored this direction or if there's a reason the memory persistence approach works better for your use case?
Each time an LLM looks at my project, it's like a newcomer has arrived. If it keeps repeating mistakes, it's because my project sucks.
It's an unique opportunity. You can have lots of repeated feedback from "infinite newcomers" to a project, each of their failures an opportunity to make things clearer. Better docs (for humans, no machine-specific hacks), better conventions, better examples, more intuitive code.
That, in my opinion, is how markdown (for machines only and not humans) will fall. There will be a breed of projects that thrives with minimal machine-specific context.
For example, if my project uses MIDI, I'm much better doing some specialized tools and examples that introduce MIDI to newcomers (machines and humans alike) than writing extensive "skill documents" that explain what MIDI is and how it works.
Think like a human do. Do you prefer being introduced to a codebase by reading lots of verbose docs or having some ready-to-run examples that can get you going right away? We humans also forget, or ignore, or keep redundant context sources away (for a good reason).
We use Cursor where I work and I find it a good medium for still being in control and knowing what is happening with all of the changes being reviewed in an IDE. Claude feels more like a black box, and one with so many options that it's just overwhelming, yet I continue to try and figure out the best way to use it for my personal projects.
Claude code suffers from initial decision fatigue in my opinion.
I run it in automatic mode with decent namespacing, so thoughts, notes, and whole conversations just accumulate in a structured way. As I work, it stores the session and builds small semantic, entity-based hypergraphs of what I was thinking about.
Later I’ll come back and ask things like:
what was I actually trying to fix here?
what research threads exist already?
where did my reasoning drift?
Sometimes I’ll even ask Claude to reflect on its own reasoning in a past session and point out where it was being reactive or missed connections.
My own fully-local, minimalistic take on this problem of "session continuation without compaction" is to rely on the session JSONL files directly rather than create separate "memory" artifacts, and seamlessly index them to enable fast full-text search. This is the idea behind the "aichat" command-group + plugin I just added to my claude-code-tools [1] repo. You can quit your Claude-Code/Codex-CLI session S and type
aichat resume <id-of-session-S-you-just-quit>
It launches a TUI, offering a few ways to continue your work:- blind trim - clones the session, truncates large tool calls/results and older assistant messages, which can clear up as much as 50% of context depending of course on what's going on; this is a quick hack to continue your work a bit longer
- smart trim - similar but uses headless agent to decide what to truncate
- rollover: the one I use most frequently; it creates a new session S1 (which can optionally be a different CLI agent, allowing cross-agent work continuation), and injects back-pointers to the parent session JSONL file of S, the parent's parent , and so on (what I call session lineage) , into the first user message, and the user can then prompt the agent to use a sub-agent to extract arbitrary context from the ancestor sessions to continue the work.
[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...
I work primarily in Python and maintain extensive coding conventions there - patterns allowed/forbidden, preferred libs, error handling, etc. Custom slash commands like `/use-recommended-python` (loads my curated libs: pendulum over datetime, httpx over requests) and `/find-reinvented-the-wheel` to catch when Claude ignored existing utilities.
My use case: multiple smaller Python projects (similar to steipete's workflow https://github.com/steipete), so cross-project consistency matters more than single-codebase context.
Yes, ~15k tokens for CLAUDE.md + rules. I sacrifice context for consistency. Worth it.
Also baked in my dev philosophy: Carmack-style - make it work first, then fast. Otherwise Claude over-optimizes prematurely.
These memory abstractions are too complicated for me and too inconsistent in practice. I'd rather maintain a living document I control and constantly refine.
But imagine how hard it would be if these kids had short term memory only and they would not know what to focus on except what you tell them to. You literally have to tell them "Here is A-Z pay attention to 'X' only and go do your thing". Add in other managers for this party like a caterer, clowns, your spouse and they also have to tell them that and remember, communicate what other managers have done. No one has solved for this, really.
This is what it felt like in 2025 to code with LLMs on non trivial projects, with some what of an improvement as the year went by. But I am not sure much progress was made in fixing the process part of the problem.
Though I have found repo level claude.md that is updated everytime claude makes a mistake plus using —restore to select a previous relevant session works well.
There is no way for Anthropic to optimize Claude code or the underlying models for these custom setups. So it’s probably better to stick with the patterns Anthropic engineers use internally.
Quite a few of you have mentioned that you store a lot of your working context across sessions in some md file - what are you actually storing? What data do you actually go back to and refer to as you're building?
I'm sold.
With that said, I can't think of a way that this would work. How does this work? I took a very quick glance, and it's not obvious at first glance.
The whole problem is, the AI is short on context, it has limited memory. Of course, you can store lots of memory elsewhere, but how do you solve the problem of having the AI not know what's in the memory as it goes from step to step? How does it sort of find the relevant memory at the time that that relevance is most active?
Could you just walk through the sort of conceptual mechanism of action of this thing?
I’m curious how people think about portability: e.g. letting Claude Code retrieve context that was created while using Codex, Manus, or Cursor, or sharing specific parts of that context with other people or agents.
At that point, log parsing and summaries become per-tool views of state rather than shared state. Do people think a shared external memory layer is overkill here, or a necessary step once you have multiple agents/tools in play?
If you're using them though, we no longer have the problem of Claude forgetting things.
I’m never stopped and Claude always remembers what we’re doing.
This pattern has been highly productive for 8 months.
Combined with a good AGENTS.md, it seems to be working really well.
This seems like it wouldn't accomplish much more than those methods. It knows my stack preferences, what I want commit messages to look like, etc.
Deploy the service on your cloud server or your local computer, then add the streamable MCP and skill to Claude Code.
To activate in a new conversation, simply reference the skill first: `@~/.claude/skills/mem/SKILL.md`.
If you like this project, please give it a star on GitHub!
Claude Code keeps all the conversation logs stored on-disk right? Why not parse them asynchronously and then use hooks to enrich the context as the conversation goes? (I mean in the most broad and generic way, I guess we’d have to embed them, do some RAG… the whole thing)
Then again, this might be just me. When there's a task to be done, even without an LLM my thought process is about selecting the relevant parts of my context for solving it. What is relevant? What starting point has the best odds of being good? That translates naturally to tasking an LLM.
Let's say I have a spec I'm working on. It's based off of a requirements document. If I want to think about the spec in isolation (let's say I want to ask the LLM what requirements are actually being fulfilled by the spec), I can just pass the spec, without passing the requirements. Then I'll compare the response against the actual requirements.
At the end of the day, I guess I hate the automagicness of a silent context injection. Like I said, it also negates the perfect forgetfulness of LLMs.
I use things like claude projects on the web app and skills and stuff, and claude code heavily.
I want to manually curate the context, adding memory is a anti pattern for this, I don't want the LLM grabbing tokens from memory that may or may not be relevant, and most likely will be stale.
Even if most approaches fail, exploring that boundary feels useful - especially if the system is transparent about what it stores and why.
Claude itself can just update the claude.md file with whatever you might have forgot to put in there.
You can stick it in git and it lives with the project.
I’ll give this a go though and let you know!
Or, over continuing the same session and compacting?
Just their thought management git system works pretty well for me TBH. https://www.humanlayer.dev/
Why did you need to use AI to write this post?
Did Claude write this?
Not this. Not that. Just something.
What it does.
What it doesn't do.
> ... fix it.
It’s almost as if software authors are afraid that if their project names are too descriptive, they won’t be able to pivot to some other purpose, which ends up making every project name sound at once banal and vague.
AI writing slop is infecting everything. Nothing turns me off this product more than the feeling you can’t even write about it as a human. If you can’t do that, why would I use or value it?