FRESH

Hacker News

Home

Show HN: Stop Claude Code from forgetting everything

196 points by austinbaggio

by ramoz

1 subcomments

I struggle with these abstractions over context windows, esp when anthropic is actively focused on improving things like compaction, and knowing the eventual* goal is for the models to yave real memory layers baked in. Until then we have to optimize with how agents work best and ephemeral context is a part of that (they weren’t RL’d/trained with memory abstractions so we shouldn’t use them at inference either). Constant rediscovery that is task specific has worked well for me, doesn’t suffer from context decay, though it does eat more tokens.
Otherwise the ability to search back through history is a valuable simple git log/diff or (rip)grep/jq combo over the session directory. Simple example of mine: https://github.com/backnotprop/rg_history

by saberience

3 subcomments

Do we really need another vibe-coded LLM context/memory startup?
Do the authors have any benchmarks or test to show that this genuinely improved outputs?
I have tried probably 10-20 other open source projects and closed source projects purporting to improve Claude Code with memory/context, and still to this date, nothing works better than simply keeping my own library of markdown files for each project specification, markdown files for decisions made etc, and then explicitly telling Claude Code to review x,y,z markdown files.
I would also suggest to the founders, don't found a startup based on improving context for Claude Code, why? Because this is the number 1 thing the Claude Code developers are working on too, and it's clearly getting better and better with every release.
So not only are you competing with like 20+ other startups and 20+ other open-source projects, you are competing with Anthropic too.

by gbnwl

9 subcomments

I'm not sure how many HN users frequent other places related to agentic coding like the subreddits of particular providers, but this has got to be the 1000th "ultimate memory system"/break-free-of-the-context-limit-tyranny! project I've seen, and like all other similar projects there's never any evidence or even attempt at measuring any metric of performance improved by it. Of course it's hard to measure such a thing, but that's part of exactly why it's hard to build something like this. Here's user #1001 that's been told by Claude "What a fascinating idea! You've identified a real gap in the market for a simple database based memory system to extend agent memory."

by ossa-ma

4 subcomments

There are a quadrillion startups (mem0, langmem, zep, supermemory), open source repos (claude-mem, beads), and tools that do this.
My approach is literally just a top-level, local, git version controlled memory system with 3 commands:
- /handoff - End of session, capture into an inbox.md
- /sync - Route inbox.md to custom organised markdown files
- /engineering (or /projects, /tasks, /research) - Load context into next session
I didn't want a database or an MCP server or embeddings or auto-indexing when I can build something frictionless that works with git and markdown.
Repo: https://github.com/ossa-ma/double (just published it publicly but its about the idea imo)
Writeup: https://ossa-ma.github.io/blog/double

by coffeeboy27

1 subcomments

What's the data retention/deletion policy and is there a self-hosted option planned? I'd prefer not to send proprietary code to third-party servers.

by edmundsparrow

0 subcomment

I've been manually copying responses between chats when I hit token limits, and I'm wondering - have you considered a multi-AI consensus approach instead of persistent memory?
The idea: multiple AIs (Claude, GPT, Gemini, Grok) brainstorm simultaneously and produce one agreed response. This might solve the context problem more elegantly because:
- No token limit anxiety - you get comprehensive answers upfront - Better quality through AI cross-validation - The consensus answer naturally becomes your context - Simpler to implement - just parallel API calls vs memory tree management
Just curious if you've explored this direction or if there's a reason the memory persistence approach works better for your use case?

by gaigalas

0 subcomment

I like the fact that it forgets.
Each time an LLM looks at my project, it's like a newcomer has arrived. If it keeps repeating mistakes, it's because my project sucks.
It's an unique opportunity. You can have lots of repeated feedback from "infinite newcomers" to a project, each of their failures an opportunity to make things clearer. Better docs (for humans, no machine-specific hacks), better conventions, better examples, more intuitive code.
That, in my opinion, is how markdown (for machines only and not humans) will fall. There will be a breed of projects that thrives with minimal machine-specific context.
For example, if my project uses MIDI, I'm much better doing some specialized tools and examples that introduce MIDI to newcomers (machines and humans alike) than writing extensive "skill documents" that explain what MIDI is and how it works.
Think like a human do. Do you prefer being introduced to a codebase by reading lots of verbose docs or having some ready-to-run examples that can get you going right away? We humans also forget, or ignore, or keep redundant context sources away (for a good reason).

by qudat

4 subcomments

Have you tried https://github.com/steveyegge/beads

by JoshGlazebrook

17 subcomments

Is anyone else just completely overwhelmed with the number of things you _need_ for claude code? Agents, sub agents, skills, claud.md, agents.md, rules, hooks, etc.
We use Cursor where I work and I find it a good medium for still being in control and knowing what is happening with all of the changes being reviewed in an IDE. Claude feels more like a black box, and one with so many options that it's just overwhelming, yet I continue to try and figure out the best way to use it for my personal projects.
Claude code suffers from initial decision fatigue in my opinion.

by zyan1de

0 subcomment

I mostly use it during long Claude Code research sessions so I don’t lose my place between days.
I run it in automatic mode with decent namespacing, so thoughts, notes, and whole conversations just accumulate in a structured way. As I work, it stores the session and builds small semantic, entity-based hypergraphs of what I was thinking about.
Later I’ll come back and ask things like:
what was I actually trying to fix here?
what research threads exist already?
where did my reasoning drift?
Sometimes I’ll even ask Claude to reflect on its own reasoning in a past session and point out where it was being reactive or missed connections.

by d4rkp4ttern

0 subcomment

I like that it does not require following any particular "system" or discipline. But having to use a non-local/proprietary memory layer is not ideal.
My own fully-local, minimalistic take on this problem of "session continuation without compaction" is to rely on the session JSONL files directly rather than create separate "memory" artifacts, and seamlessly index them to enable fast full-text search. This is the idea behind the "aichat" command-group + plugin I just added to my claude-code-tools [1] repo. You can quit your Claude-Code/Codex-CLI session S and type
```
    aichat resume <id-of-session-S-you-just-quit>
```
It launches a TUI, offering a few ways to continue your work:
- blind trim - clones the session, truncates large tool calls/results and older assistant messages, which can clear up as much as 50% of context depending of course on what's going on; this is a quick hack to continue your work a bit longer
- smart trim - similar but uses headless agent to decide what to truncate
- rollover: the one I use most frequently; it creates a new session S1 (which can optionally be a different CLI agent, allowing cross-agent work continuation), and injects back-pointers to the parent session JSONL file of S, the parent's parent , and so on (what I call session lineage) , into the first user message, and the user can then prompt the agent to use a sub-agent to extract arbitrary context from the ancestor sessions to continue the work.
[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

by BonoboIO

0 subcomment

Different approach: I continuously refine my global CLAUDE.md (~/.claude/CLAUDE.md) instead of external memory systems.
I work primarily in Python and maintain extensive coding conventions there - patterns allowed/forbidden, preferred libs, error handling, etc. Custom slash commands like `/use-recommended-python` (loads my curated libs: pendulum over datetime, httpx over requests) and `/find-reinvented-the-wheel` to catch when Claude ignored existing utilities.
My use case: multiple smaller Python projects (similar to steipete's workflow https://github.com/steipete), so cross-project consistency matters more than single-codebase context.
Yes, ~15k tokens for CLAUDE.md + rules. I sacrifice context for consistency. Worth it.
Also baked in my dev philosophy: Carmack-style - make it work first, then fast. Otherwise Claude over-optimizes prematurely.
These memory abstractions are too complicated for me and too inconsistent in practice. I'd rather maintain a living document I control and constantly refine.

by photios

0 subcomment

Can it run with a local DB? I have zero interest in another monthly subscription pretending to be "an open source tool".

by itissid

0 subcomment

The general process feels very much like having kids over for a birthday party. Except you have to get them all to play nice and you have no idea what this other kid was conditioned on by their parents. Generally it would all work fine, all the kids know how the party progresses and what their roles are — if any.
But imagine how hard it would be if these kids had short term memory only and they would not know what to focus on except what you tell them to. You literally have to tell them "Here is A-Z pay attention to 'X' only and go do your thing". Add in other managers for this party like a caterer, clowns, your spouse and they also have to tell them that and remember, communicate what other managers have done. No one has solved for this, really.
This is what it felt like in 2025 to code with LLMs on non trivial projects, with some what of an improvement as the year went by. But I am not sure much progress was made in fixing the process part of the problem.

by scubbo

1 subcomments

I've been tinkering with building something similar for myself - though for a generic chatbot, rather than for Claude (not every task is coding, and I'd like to keep !). From other comments (e.g. https://news.ycombinator.com/item?id=46428368, https://news.ycombinator.com/item?id=46427950) suggest that many others are already ahead of me. Any recs for tools, libraries, or approaches that I should learn from or adopt? In particular, I've found that - no matter how direct and clear the system prompt is - models have a tendency to respond verbally as if they've made a tool-call recording some gained-knowledge ("thanks! I'll remember that"), but to not actually return the JSON required to trigger the call by the tool.

by ec109685

2 subcomments

This is impressive.
Though I have found repo level claude.md that is updated everytime claude makes a mistake plus using —restore to select a previous relevant session works well.
There is no way for Anthropic to optimize Claude code or the underlying models for these custom setups. So it’s probably better to stick with the patterns Anthropic engineers use internally.

by austinbaggio

1 subcomments

Thanks everyone for the comments, really, I wasn't expecting this.
Quite a few of you have mentioned that you store a lot of your working context across sessions in some md file - what are you actually storing? What data do you actually go back to and refer to as you're building?

by Agent_Builder

0 subcomment

Memory problems usually hide a deeper issue. In our experience, pushing state explicitly between steps is more reliable than relying on long-lived agent memory.

by CPLX

5 subcomments

I absolutely love this concept! It's like the thing that I've been looking for my whole life. Well, at least since I've been using Claude Code, which is this year.
I'm sold.
With that said, I can't think of a way that this would work. How does this work? I took a very quick glance, and it's not obvious at first glance.
The whole problem is, the AI is short on context, it has limited memory. Of course, you can store lots of memory elsewhere, but how do you solve the problem of having the AI not know what's in the memory as it goes from step to step? How does it sort of find the relevant memory at the time that that relevance is most active?
Could you just walk through the sort of conceptual mechanism of action of this thing?

by altmanaltman

2 subcomments

Thank you for specifying it wasn't magic or AGI.

by amannm

0 subcomment

There's a lot of people interested in forming some sort of memory layer around vendored LLM services. I don't think they realize how much impact a single error that disappears from your immediate attention can have on downstream performance. Now think of the accrual of those errors over time and your lack of ability to discern if it was service degradation or a bad prompt or a bad AGENTS.md OR now this "long term memory" or whatever. If this sort of feature will ever be viable, the service providers will offer the best solution only behind their API, optimized for their models and their infrastructure.

by christinetyip

1 subcomments

A lot of the discussion here is about memory inside a single tool, which makes sense.
I’m curious how people think about portability: e.g. letting Claude Code retrieve context that was created while using Codex, Manus, or Cursor, or sharing specific parts of that context with other people or agents.
At that point, log parsing and summaries become per-tool views of state rather than shared state. Do people think a shared external memory layer is overkill here, or a necessary step once you have multiple agents/tools in play?

by realitydrift

0 subcomment

Most memory tools are really about coordination, not recall. The problem shows up when context splinters across sessions, tools, and parallel agents and there’s no longer a clear source of truth. Retrieval only helps if you can see what was pulled in and why, otherwise hidden context quietly warps the work. The only metric that matters is whether you spend less time re-explaining decisions and more time continuing from where you actually left off.

by AndyNemmity

2 subcomments

I don't understand the use case. I think if you don't use agents, and skills currently effectively, then perhaps this is useful.
If you're using them though, we no longer have the problem of Claude forgetting things.

by ChicagoDave

1 subcomments

I use 92% of context, have Claude write a “work summary” to a context folder, commit, push, quit, restart, repeat.
I’m never stopped and Claude always remembers what we’re doing.
This pattern has been highly productive for 8 months.

by linsomniac

1 subcomments

The past few weeks I've been experimenting with using less context and less memory and it's been going really well. Where before I'd try to do a bunch of fairly related things in a single session, experimenting with compacting more or less frequently, now I'm clearing my context or exiting and restarting claude and codex. It seems to help it focus on the task at hand, hasn't tended to go off into the weeds as much, and my token costs have dropped way down.
Combined with a good AGENTS.md, it seems to be working really well.

0 subcomment

by johann8384

0 subcomment

I have markdown files in ~/.claude/guides that I refer to, my subagents have instructions about, and my claude.md in several projects reference them when relevant.
This seems like it wouldn't accomplish much more than those methods. It knows my stack preferences, what I want commit messages to look like, etc.

by minikomi

0 subcomment

I use gptel[0] with my denote[1] notes, and a tool that can search/retrieve tags/grep/create notes (in a specific sub folder). It's been good enough as a memory for me.
0: https://github.com/karthink/gptel
1: https://protesilaos.com/emacs/denote

by huali

1 subcomments

I've built a lightweight Memory MCP service to efficiently store conversation memories. It only implements essential *CRUD* (Create, Read, Update, Delete) methods, minimizing token usage.
Deploy the service on your cloud server or your local computer, then add the streamable MCP and skill to Claude Code.
To activate in a new conversation, simply reference the skill first: `@~/.claude/skills/mem/SKILL.md`.
If you like this project, please give it a star on GitHub!

by rmonvfer

1 subcomments

Looks cool but as others have said, it’s really hard to just try all similar projects because all of them promise the same thing but I haven’t seen any of them provide any benchmarks.
Claude Code keeps all the conversation logs stored on-disk right? Why not parse them asynchronously and then use hooks to enrich the context as the conversation goes? (I mean in the most broad and generic way, I guess we’d have to embed them, do some RAG… the whole thing)

0 subcomment

by dmos62

0 subcomment

I consider the "perfect fortgetfulness" of LLMs a great feature, because I can then precisely select what the context is for a given task. Context is additive, so once something's in it, it's doing something: most I could do is try to counteract it, which is like playing jailbreak.
Then again, this might be just me. When there's a task to be done, even without an LLM my thought process is about selecting the relevant parts of my context for solving it. What is relevant? What starting point has the best odds of being good? That translates naturally to tasking an LLM.
Let's say I have a spec I'm working on. It's based off of a requirements document. If I want to think about the spec in isolation (let's say I want to ask the LLM what requirements are actually being fulfilled by the spec), I can just pass the spec, without passing the requirements. Then I'll compare the response against the actual requirements.
At the end of the day, I guess I hate the automagicness of a silent context injection. Like I said, it also negates the perfect forgetfulness of LLMs.

by deadeye

0 subcomment

Is this for people who haven't read the docs on how to instruct the agent on common setups and preferences?

by kingkongjaffa

0 subcomment

I actively don't want to use LLMs this way.
I use things like claude projects on the web app and skills and stuff, and claude code heavily.
I want to manually curate the context, adding memory is a anti pattern for this, I don't want the LLM grabbing tokens from memory that may or may not be relevant, and most likely will be stale.

by devhouse

0 subcomment

Feels like this is solving a problem that /compact should solve but doesn't. The fact that post-compaction Claude 'feels dumber' suggests the summarization is too aggressive? Would be interesting if Anthropic exposed more control over what gets preserved vs. compressed ... or let users provide their own summary template.

by AmiteK

0 subcomment

One thing that seems under-discussed is what kind of state is worth persisting. Raw chat logs are cheap; distilled decisions, constraints, and preferences are harder but much more valuable.
Even if most approaches fail, exploring that boundary feels useful - especially if the system is transparent about what it stores and why.

by jMyles

0 subcomment

We built one too, with a web frontend and a 'spy' viewer in case your team wants to watch your interactions. Also has secret redaction:
https://github.com/jMyles/memory-lane

by robertwt7

1 subcomments

Congrats for this! how does this differs from claude-mem? I've been using claude-mem for a while now
https://github.com/thedotmack/claude-mem

by EMM_386

0 subcomment

Just put a claude.md file in your directory. If you want more details about a subdirectory put one in there too.
Claude itself can just update the claude.md file with whatever you might have forgot to put in there.
You can stick it in git and it lives with the project.

by sabareesh

1 subcomments

Non starter for us, we cant ship propriety data to a third party servers.

by heliumtera

1 subcomments

Stop Claude from forgetting by telling it to not forget

by dr_dshiv

1 subcomments

I just ask Claude to look at past conversations where I was working on x… it sometimes thinks it can’t see them, but it can.
I’ll give this a go though and let you know!

by bilbo-b-baggins

1 subcomments

Your site advertises careers in San Francisco/Remote. California law requires compensation disclosures.

by senshan

1 subcomments

What is the advantage over summarizing previous sessions for the new one?
Or, over continuing the same session and compacting?

by zyan1de

0 subcomment

maybe you are in a claude code session and think "didn't i already make design doc for system like this one?" Or you could even look at your thought process in a previous session and reflect. but rn i mainly use it for reviewing research and the hypergraph retrieval

by itissid

0 subcomment

has anyone had good experience with humanlayer's system/process of management?
Just their thought management git system works pretty well for me TBH. https://www.humanlayer.dev/

by jijji

0 subcomment

every time Claude code loads or it compacts the conversation it loses its context so I always type in: read CLAUDE.md .... which usually solves the problem... I run Claude code on a few screen sessions in different directories for months

by lloydatkinson

1 subcomments

> Not magic. Not AGI. Just state.
Why did you need to use AI to write this post?

by alex_young

0 subcomment

Doesn't Claude already use RAG on the backend?

by antonvs

0 subcomment

> Not magic. Not AGI. Just state.
Did Claude write this?

by fullstick

0 subcomment

I like it when the conversation is new sometimes.

0 subcomment

by Terretta

0 subcomment

Low effort in the Show HN copy suggests low effort in the tool.
Not this. Not that. Just something.
What it does.
What it doesn't do.
> ... fix it.

by bgun

0 subcomment

I like this. Although - can we stop naming every project with a single short, common, vaguely related English word? Does anyone name software after what it actually does anymore?
It’s almost as if software authors are afraid that if their project names are too descriptive, they won’t be able to pivot to some other purpose, which ends up making every project name sound at once banal and vague.

by graphememes

0 subcomment

stop wasting context space with this stuff ミ · · 彡

by vintagedave

0 subcomment

“Not magic. Not AGI. Just state.”
AI writing slop is infecting everything. Nothing turns me off this product more than the feeling you can’t even write about it as a human. If you can’t do that, why would I use or value it?

by jdthedisciple

0 subcomment

introduces another layer of caching issues, the worst of all kinds ... not sure it's worth the risk

by idiotsecant

1 subcomments

Not X, not Y, just slop

0 subcomment

by incoming1211

0 subcomment

[dead]