FRESH

Hacker News

Home

How I use Claude Code: Separation of planning and execution

936 points by vinhnx

by intellegix

0 subcomment

This separation of planning and execution is exactly the pattern I ended up building into an open source toolkit for Claude Code. The key insight that made autonomous loops work was giving the loop driver awareness of the CLAUDE.md file as the "plan" layer — the human edits CLAUDE.md between runs to steer the project, and the loop driver handles execution (session continuity, budget enforcement, stagnation detection, model fallback from Opus to Sonnet on consecutive timeouts).
The other piece that helped was a multi-model council system — before committing to a major architectural decision, the toolkit queries GPT-4, Claude, and Gemini simultaneously through Perplexity, then synthesizes with Opus. Having three models surface their assumptions (as the top comment here describes) catches more blind spots than any single model.
194 pytest tests, MIT licensed: https://github.com/intellegix/intellegix-code-agent-toolkit

by sparin9

7 subcomments

I think the real value here isn’t “planning vs not planning,” it’s forcing the model to surface its assumptions before they harden into code.
LLMs don’t usually fail at syntax. They fail at invisible assumptions about architecture, constraints, invariants, etc. A written plan becomes a debugging surface for those assumptions.

by haolez

26 subcomments

> Notice the language: “deeply”, “in great details”, “intricacies”, “go through everything”. This isn’t fluff. Without these words, Claude will skim. It’ll read a file, see what a function does at the signature level, and move on. You need to signal that surface-level reading is not acceptable.
This makes no sense to my intuition of how an LLM works. It's not that I don't believe this works, but my mental model doesn't capture why asking the model to read the content "more deeply" will have any impact on whatever output the LLM generates.

by brandall10

2 subcomments

I go a bit further than this and have had great success with 3 doc types and 2 skills:
- Specs: these are generally static, but updatable as the project evolves. And they're broken out to an index file that gives a project overview, a high-level arch file, and files for all the main modules. Roughly ~1k lines of spec for 10k lines of code, and try to limit any particular spec file to 300 lines. I'm intimately familiar with every single line in these.
- Plans: these are the output of a planning session with an LLM. They point to the associated specs. These tend to be 100-300 lines and 3 to 5 phases.
- Working memory files: I use both a status.md (3-5 items per phase roughly 30 lines overall), which points to a latest plan, and a project_status (100-200 lines), which tracks the current state of the project and is instructed to compact past efforts to keep it lean)
- A planner skill I use w/ Gemini Pro to generate new plans. It essentially explains the specs/plans dichotomy, the role of the status files, and to review everything in the pertinent areas of code and give me a handful of high-level next set of features to address based on shortfalls in the specs or things noted in the project_status file. Based on what it presents, I select a feature or improvement to generate. Then it proceeds to generate a plan, updates a clean status.md that points to the plan, and adjusts project_status based on the state of the prior completed plan.
- An implementer skill in Codex that goes to town on a plan file. It's fairly simple, it just looks at status.md, which points to the plan, and of course the plan points to the relevant specs so it loads up context pretty efficiently.
I've tried the two main spec generation libraries, which were way overblown, and then I gave superpowers a shot... which was fine, but still too much. The above is all homegrown, and I've had much better success because it keeps the context lean and focused.
And I'm only on the $20 plans for Codex/Gemini vs. spending $100/month on CC for half year prior and move quicker w/ no stall outs due to token consumption, which was regularly happening w/ CC by the 5th day. Codex rarely dips below 70% available context when it puts up a PR after an execution run. Roughly 4/5 PRs are without issue, which is flipped against what I experienced with CC and only using planning mode.

by zmmmmm

1 subcomments

I actually don't really like a few of things about this approach.
First, the "big bang" write it all at once. You are going to end up with thousands of lines of code that were monolithically produced. I think it is much better to have it write the plan and formulate it as sensible technical steps that can be completed one at a time. Then you can work through them. I get that this is not very "vibe"ish but that is kind of the point. I want the AI to help me get to the same point I would be at with produced code AND understanding of it, just accelerate that process. I'm not really interested in just generating thousands of lines of code that nobody understands.
Second, the author keeps refering to adjusting the behaviour, but never incorporating that into long lived guidance. To me, integral with the planning process is building an overarching knowledge base. Every time you're telling it there's something wrong, you need to tell it to update the knowledge base about why so it doesn't do it again.
Finally, no mention of tests? Just quick checks? To me, you have to end up with comprehensive tests. Maybe to the author it goes without saying, but I find it is integral to build this into the planning. Certain stages you will want certain types of tests. Some times in advance of the code (so TDD style) other times built alongside it or after.
It's definitely going to be interesting to see how software methodology evolves to incorporate AI support and where it ultimately lands.

by alexrezvov

0 subcomment

Cool, the idea of leaving comments directly in the plan never even occurred to me, even though it really is the obvious thing to do.
Do you markup and then save your comments in any way, and have you tried keeping them so you can review the rules and requirements later?

by mvkel

9 subcomments

> the workflow I’ve settled into is radically different from what most people do with AI coding tools
This looks exactly like what anthropic recommends as the best practice for using Claude Code. Textbook.
It also exposes a major downside of this approach: if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong.
I've found a much better approach in doing a design -> plan -> execute in batches, where the plan is no more than 1,500 lines, used as a proxy for complexity.
My 30,000 LOC app has about 100,000 lines of plan behind it. Can't build something that big as a one-shot.

by red_hare

3 subcomments

I use Claude Code for lecture prep.
I craft a detailed and ordered set of lecture notes in a Quarto file and then have a dedicated claude code skill for translating those notes into Slidev slides, in the style that I like.
Once that's done, much like the author, I go through the slides and make commented annotations like "this should be broken into two slides" or "this should be a side-by-side" or "use your generate clipart skill to throw an image here alongside these bullets" and "pull in the code example from ../examples/foo." It works brilliantly.
And then I do one final pass of tweaking after that's done.
But yeah, annotations are super powerful. Token distance in-context and all that jazz.

by DustinKlent

0 subcomment

This is basically how "Roo Code" works. It's a VSCode extension.

by EastLondonCoder

1 subcomments

I don’t use plan.md docs either, but I recognise the underlying idea: you need a way to keep agent output constrained by reality.
My workflow is more like scaffold -> thin vertical slices -> machine-checkable semantics -> repeat.
Concrete example: I built and shipped a live ticketing system for my club (Kolibri Tickets). It’s not a toy: real payments (Stripe), email delivery, ticket verification at the door, frontend + backend, migrations, idempotency edges, etc. It’s running and taking money.
The reason this works with AI isn’t that the model “codes fast”. It’s that the workflow moves the bottleneck from “typing” to “verification”, and then engineers the verification loop:
```
  -keep the spine runnable early (end-to-end scaffold)

  -add one thin slice at a time (don’t let it touch 15 files speculatively)

  -force checkable artifacts (tests/fixtures/types/state-machine semantics where it matters)

  -treat refactors as normal, because the harness makes them safe
```
If you run it open-loop (prompt -> giant diff -> read/debug), you get the “illusion of velocity” people complain about. If you run it closed-loop (scaffold + constraints + verifiers), you can actually ship faster because you’re not paying the integration cost repeatedly.
Plan docs are one way to create shared state and prevent drift. A runnable scaffold + verification harness is another.

by nikolay

0 subcomment

Well, that's already done by Amazon's Kiro [0], Google's Antigravity [1], GitHub's Spec Kit [2], and OpenSpec [3]!
[0]: https://kiro.dev/
[1]: https://antigravity.google/
[2]: https://github.github.com/spec-kit/
[3]: https://openspec.dev/

by turingsroot

2 subcomments

I've been teaching AI coding tool workshops for the past year and this planning-first approach is by far the most reliable pattern I've seen across skill levels.
The key insight that most people miss: this isn't a new workflow invented for AI - it's how good senior engineers already work. You read the code deeply, write a design doc, get buy-in, then implement. The AI just makes the implementation phase dramatically faster.
What I've found interesting is that the people who struggle most with AI coding tools are often junior devs who never developed the habit of planning before coding. They jump straight to "build me X" and get frustrated when the output is a mess. Meanwhile, engineers with 10+ years of experience who are used to writing design docs and reviewing code pick it up almost instantly - because the hard part was always the planning, not the typing.
One addition I'd make to this workflow: version your research.md and plan.md files in git alongside your code. They become incredibly valuable documentation for future maintainers (including future-you) trying to understand why certain architectural decisions were made.

by umairnadeem123

0 subcomment

The multi-pass approach works outside of code too. I run a fairly complex automation pipeline (prompt -> script -> images -> audio -> video assembly) and the single biggest quality improvement was splitting generation into discrete planning and execution phases. One-shotting a 10-step pipeline means errors compound. Having the LLM first produce a structured plan, then executing each step against that plan with validation gates between them, cut my failure rate from maybe 40% to under 10%. The planning doc also becomes a reusable artifact you can iterate on without re-running everything.

by colinhb

2 subcomments

Quoting the article:
> One trick I use constantly: for well-contained features where I’ve seen a good implementation in an open source repo, I’ll share that code as a reference alongside the plan request. If I want to add sortable IDs, I paste the ID generation code from a project that does it well and say “this is how they do sortable IDs, write a plan.md explaining how we can adopt a similar approach.” Claude works dramatically better when it has a concrete reference implementation to work from rather than designing from scratch.
Licensing apparently means nothing.
Ripped off in the training data, ripped off in the prompt.

by duttish

1 subcomments

This is quite close to what I've arrived at, but with two modifications
1) anything larger I work on in layers of docs. Architecture and requirements -> design -> implementation plan -> code. Partly it helps me think and nail the larger things first, and partly helps claude. Iterate on each level until I'm satisfied.
2) when doing reviews of each doc I sometimes restart the session and clear context, it often finds new issues and things to clear up before starting the next phase.

by gary17the

1 subcomments

> Read deeply, write a plan, annotate the plan until it’s right, then let Claude execute the whole thing without stopping, checking types along the way.
As others have already noted, this workflow is exactly what the Google Antigravity agent (based off Visual Studio Code) has been created for. Antigravity even includes specialized UI for a user to annotate selected portions of an LLM-generated plan before iterating it.
One significant downside to Antigravity I have found so far is the fact that even though it will properly infer a certain technical requirement and clearly note it in the plan it generates (for example, "this business reporting column needs to use a weighted average"), it will sometimes quietly downgrade such a specialized requirement (for example, to a non-weighted average), without even creating an appropriate "WARNING:" comment in the generated code. Especially so when the relevant codebase already includes a similar, but not exactly appropriate API. My repetitive prompts to ALWAYS ask about ANY implementation ambiguities WHATSOEVER go unanswered.
From what I gather Claude Code seems to be better than other agents at always remembering to query the user about implementation ambiguities, so maybe I will give Claude Code a shot over Antigravity.

by tabs_or_spaces

0 subcomment

My workflow is a bit different.
* I ask the LLM for it's understanding of a topic or an existing feature in code. It's not really planning, it's more like understanding the model first
* Then based on its understanding, I can decide how great or small to scope something for the LLM
* An LLM showing good understand can deal with a big task fairly well.
* An LLM showing bad understanding still needs to be prompted to get it right
* What helps a lot is reference implementations. Either I have existing code that serves as the reference or I ask for a reference and I review.
A few folks do it at my work do it OPs way, but my arguments for not doing it this way
* Nobody is measuring the amount of slop within the plan. We only judge the implementation at the end
* it's still non deterministic - folks will have different experiences using OPs methods. If claude updates its model, it outdates OPs suggestions by either making it better or worse. We don't evaluate when things get better, we only focus on things not gone well.
* it's very token heavy - LLM providers insist that you use many tokens to get the task done. It's in their best interest to get you to do this. For me, LLMs should be powerful enough to understand context with minimal tokens because of the investment into model training.
Both ways gets the task done and it just comes down to my preference for now.
For me, I treat the LLM as model training + post processing + input tokens = output tokens. I don't think this is the best way to do non deterministic based software development. For me, we're still trying to shoehorn "old" deterministic programming into a non deterministic LLM.

by deevus

2 subcomments

This is what I do with the obra/superpowers[0] set of skills.
1. Use brainstorming to come up with the plan using the Socratic method
2. Write a high level design plan to file
3. I review the design plan
4. Write an implementation plan to file. We've already discussed this in detail, so usually it just needs skimming.
5. Use the worktree skill with subagent driven development skill
6. Agent does the work using subagents that for each task:
```
  a. Implements the task

  b. Spec reviews the completed task

  c. Code reviews the completed task
```
7. When all tasks complete: create a PR for me to review
8. Go back to the agent with any comments
9. If finished, delete the plan files and merge the PR
[0]: https://github.com/obra/superpowers

by cadamsdotcom

0 subcomment

The author is quite far on their journey but would benefit from writing simple scripts to enforce invariants in their codebase. Invariant broken? Script exits with a non-zero exit code and some output that tells the agent how to address the problem. Scripts are deterministic, run in milliseconds, and use zero tokens. Put them in husky or pre-commit, install the git hooks, and your agent won’t be able to commit without all your scripts succeeding.
And “Don’t change this function signature” should be enforced not by anticipating that your coding agent “might change this function signature so we better warn it not to” but rather via an end to end test that fails if the function signature is changed (because the other code that needs it not to change now has an error). That takes the author out of the loop and they can not watch for the change in order to issue said correction, and instead sip coffee while the agent observes that it caused a test failure then corrects it without intervention, probably by rolling back the function signature change and changing something else.

by zahlman

2 subcomments

> After Claude writes the plan, I open it in my editor and add inline notes directly into the document. These notes correct assumptions, reject approaches, add constraints, or provide domain knowledge that Claude doesn’t have.
This is the part that seems most novel compared to what I've heard suggested before. And I have to admit I'm a bit skeptical. Would it not be better to modify what Claude has written directly, to make it correct, rather than adding the corrections as separate notes (and expecting future Claude to parse out which parts were past Claude and which parts were the operator, and handle the feedback graciously)?
At least, it seems like the intent is to do all of this in the same session, such that Claude has the context of the entire back-and-forth updating the plan. But that seems a bit unpleasant; I would think the file is there specifically to preserve context between sessions.

by zitrusfrucht

2 subcomments

I do something very similar, also with Claude and Codex, because the workflow is controlled by me, not by the tool. But instead of plan.md I use a ticket system basically like ticket_<number>_<slug>.md where I let the agent create the ticket from a chat, correct and annotate it afterwards and send it back, sometimes to a new agent instance. This workflow helps me keeping track of what has been done over time in the projects I work on. Also this approach does not need any „real“ ticket system tooling/mcp/skill/whatever since it works purely on text files.

by swe_dima

0 subcomment

Since everyone is showing their flow, here's mine:
* create a feature-name.md file in a gitignored folder
* start the file by giving the business context
* describe a high-level implementation and user flows
* describe database structure changes (I find it important not to leave it for interpretation)
* ask Claude to inspect the feature and review if for coherence, while answering its questions I ask to augment feature-name.md file with the answers
* enter Claude's plan mode and provide that feature-name.md file
* at this point it's detailed enough that rarely any corrections from me are needed

by wokwokwok

5 subcomments

This is the way.
The practice is:
- simple
- effective
- retains control and quality
Certainly the “unsupervised agent” workflows are getting a lot of attention right now, but they require a specific set of circumstances to be effective:
- clear validation loop (eg. Compile the kernel, here is gcc that does so correctly)
- ai enabled tooling (mcp / cli tool that will lint, test and provide feedback immediately)
- oversight to prevent sgents going off the rails (open area of research)
- an unlimited token budget
That means that most people can't use unsupervised agents.
Not that they dont work; Most people have simply not got an environment and task that is appropriate.
By comparison, anyone with cursor or claude can immediately start using this approach, or their own variant on it.
It does not require fancy tooling.
It does not require an arcane agent framework.
It works generally well across models.
This is one of those few genunie pieces of good practical advice for people getting into AI coding.
Simple. Obviously works once you start using it. No external dependencies. BYO tools to help with it, no “buy my AI startup xxx to help”. No “star my github so I can a job at $AI corp too”.
Great stuff.

by srid

0 subcomment

Regarding inline notes, I use a specific format in the `/plan` command, by using th `ME:` prefix.
https://github.com/srid/AI/blob/master/commands/plan.md#2-pl...
It works very similar to Antigravity's plan document comment-refine cycle.
https://antigravity.google/docs/implementation-plan

by charkubi

0 subcomment

Planning is important because you get the LLM to explain the problem and solution in its language and structure, not yours.
This shortcuts a range of problem cases where the LLM fights between the users strict and potentially conflicting requirements, and its own learning.
In the early days we used to get LLM to write the prompts for us to get round this problem, now we have planning built in.

by koevet

5 subcomments

Has anyone found a efficient way to avoid repeating the initial codebase assessment when working with large projects?
There are several projects on GitHub that attempt to tackle context and memory limitations, but I haven’t found one that consistently works well in practice.
My current workaround is to maintain a set of Markdown files, each covering a specific subsystem or area of the application. Depending on the task, I provide only the relevant documents to Claude Code to limit the context scope. It works reasonably well, but it still feels like a manual and fragile solution. I’m interested in more robust strategies for persistent project context or structured codebase understanding.

by adithyassekhar

3 subcomments

What I've read is that even with all the meticulous planning, the author still needed to intervene. Not at the end but at the middle, unless it will continue building out something wrong and its even harder to fix once it's done. It'll cost even more tokens. It's a net negative.
You might say a junior might do the same thing, but I'm not worried about it, at least the junior learned something while doing that. They could do it better next time. They know the code and change it from the middle where it broke. It's a net positive.

by snowhale

0 subcomment

the annotation cycle in plan.md is the part that actually makes this work imo. it's not just that you're planning, it's that you can inject domain constraints that the model can't infer from the codebase alone -- stuff like "don't use X pattern here because of Y deployment constraint" or "this service has a 500ms timeout that isn't in any config file". that knowledge transfer happens naturally in code review when a human writes the code, but LLMs skip it by default.

by recroad

1 subcomments

Try OpenSpec and it'll do all this for you. SpecKit works too. I don't think there's a need to reinvent the wheel on this one, as this is spec-driven development.

by dennisjoseph

1 subcomments

The annotation cycle is the key insight for me. Treating the plan as a living doc you iterate on before touching any code makes a huge difference in output quality.
Experimentally, i've been using mfbt.ai [https://mfbt.ai] for roughly the same thing in a team context. it lets you collaboratively nail down the spec with AI before handing off to a coding agent via MCP.
Avoids the "everyone has a slightly different plan.md on their machine" problem. Still early days but it's been a nice fit for this kind of workflow.

by etothet

0 subcomment

“The workflow I’m going to describe has one core principle: never let Claude write code until you’ve reviewed and approved a written plan.”
I’m not sure we need to be this black and white about things. Speaking from the perspective of leading a dev team, I regularly have Claude Code take a chance at code without reviewing a plan. For example, small issues that I’ve written clear details about, Claude can go to town on those. I’ve never been on a team that didn’t have too many of these types of issues to address.
And, a team should have othee guards in place that validates that code before it gets merged somewhere important.
I don’t have to review every single decision one of my teammates is going to make, even those less experienced teammates, but I do prepare teammates with the proper tools (specs, documentation, etc) so they can make a best effort first attempt. This is how I treat Claude Code in a lot of scenarios.

by w10-1

1 subcomments

I try these staging-document patterns, but suspect they have 2 fundamental flaws that stem mostly from our own biases.
First, Claude evolves. The original post work pattern evolved over 9 months, before claude's recent step changes. It's likely claude's present plan mode is better than this workaround, but if you stick to the workaround, you'd never know.
Second, the staging docs that represent some context - whether a library skills or current session design and implementation plans - are not the model Claude works with. At best they are shaping it, but I've found it does ignore and forget even what's written (even when I shout with emphasis), and the overall session influences the code. (Most often this happens when a peripheral adjustment ends up populating half the context.)
Indeed the biggest benefit from the OP might be to squeeze within 1 session, omitting peripheral features and investigations at the plan stage. So the mechanism of action might be the combination of getting our own plan clear and avoiding confusing excursions. (A test for that would be to redo the session with the final plan and implementation, to see if the iteration process itself is shaping the model.)
Our bias is to believe that we're getting better at managing this thing, and that we can control and direct it. It's uncomfortable to realize you can only really influence it - much like giving direction to a junior, but they can still go off track. And even if you found a pattern that works, it might work for reasons you're not understanding -- and thus fail you eventually. So, yes, try some patterns, but always hang on to the newbie senses of wonder and terror that make you curious, alert, and experimental.

by je42

0 subcomment

There are frameworks like https://github.com/bmad-code-org/BMAD-METHOD and https://github.github.com/spec-kit/ that are working on encoding a similar kind of approach and process.

by appsoftware

0 subcomment

This is the flow I've found myself working towards. Essentially maintaining more and more layered documentation for the LLM produces better and more consistent results. What is great here is the emphasis on the use of such documents in the planning phase. I'm feeling much more motivated to write solid documentation recently, because I know someone (the LLM) is actually going to read it! I've noticed my efforts and skill acquisition have moved sharply from app developer towards DevOps and architecture / management, but I think I'll always be grateful for the application engineering experience that I think the next wave of devs might miss out on.
I've also noted such a huge gulf between some developers describing 'prompting things into existence' and the approach described in this article. Both types seem to report success, though my experience is that the latter seems more realistic, and much more likely to produce robust code that's likely to be maintainable for long term or project critical goals.

by raptorraver

0 subcomment

I’ve been using this same pattern, except not the research phase. Definetly will try to add it to my process aswell.
Sometimes when doing big task I ask claude to implement each phase seprately and review the code after each step.

by juanre

0 subcomment

Shameless plug: https://beadhub.ai allows you to do exactly that, but with several agents in parallel. One of them is in the role of planner, which takes care of the source-of-truth document and the long term view. They all stay in sync with real-time chat and mail.
It's OSS.
Real-time work is happening at https://app.beadhub.ai/juanre/beadhub (beadhub is a public project at https://beadhub.ai so it is visible).
Particularly interesting (I think) is how the agents chat with each other, which you can see at https://app.beadhub.ai/juanre/beadhub/chat

by Frannky

1 subcomments

I tried Opus 4.6 recently and it’s really good. I had ditched Claude a long time ago for Grok + Gemini + OpenCode with Chinese models. I used Grok/Gemini for planning and core files, and OpenCode for setup, running, deploying, and editing.
However, Opus made me rethink my entire workflow. Now, I do it like this:
* PRD (Product Requirements Document)
* main.py + requirements.txt + readme.md (I ask for minimal, functional, modular code that fits the main.py)
* Ask for a step-by-step ordered plan
* Ask to focus on one step at a time
The super powerful thing is that I don’t get stuck on missing accounts, keys, etc. Everything is ordered and runs smoothly. I go rapidly from idea to working product, and it’s incredibly easy to iterate if I figure out new features are required while testing. I also have GLM via OpenCode, but I mainly use it for "dumb" tasks.
Interestingly, for reasoning capabilities regarding standard logic inside the code, I found Gemini 3 Flash to be very good and relatively cheap. I don't use Claude Code for the actual coding because forcing everything via chat into a main.py encourages minimal code that's easy to skim—it gives me a clearer representation of the feature space

by Merad

0 subcomment

I've been working off and on on a vibe coded FP language and transpiler - mostly just to get more experience with Claude Code and see how it handles complex real world projects. I've settled on a very similar flow, though I use three documents: plan, context, task list. Multiple rounds of iteration when planning a feature. After completion, have a clean session do an audit to confirm that everything was implemented per the design. Then I have both Claude and CodeRabbit do code review passes before I finally do manual review. VERY heavy emphasis on tests, the project currently has 2x more test code than application code. So far it works surprisingly well. Example planning docs below -
https://github.com/mbcrawfo/vibefun/tree/main/.claude/archiv...

by vmware508

0 subcomment

How about following the test-driven approach properly? Asking Claude Code to write tests first and implement the solution after? Research -> Test Plan -> Write Tests -> Implementation Plan -> Write Implementation

by RHSeeger

1 subcomments

> Most developers type a prompt, sometimes use plan mode, fix the errors, repeat.
> ...
> never let Claude write code until you’ve reviewed and approved a written plan
I certainly always work towards an approved plan before I let it lost on changing the code. I just assumed most people did, honestly. Admittedly, sometimes there's "phases" to the implementation (because some parts can be figured out later and it's more important to get the key parts up and running first), but each phase gets a full, reviewed plan before I tell it to go.
In fact, I just finished writing a command and instruction to tell claude that, when it presents a plan for implementation, offer me another option; to write out the current (important parts of the) context and the full plan to individual (ticket specific) md files. That way, if something goes wrong with the implementation I can tell it to read those files and "start from where they left off" in the planning.

by cowlby

0 subcomment

I recently discovered GitHub speckit which separates planning/execution in stages: specify, plan, tasks, implement. Finding it aligns with the OP with the level of “focus” and “attention” this gets out of Claude Code.
Speckit is worth trying as it automates what is being described here, and with Opus 4.6 it's been a kind of BC/AD moment for me.

by paradite

0 subcomment

Lol I wrote about this and been using plan+execute workflow for 8 months.
Sadly my post didn't much attention at the time.
https://thegroundtruth.media/p/my-claude-code-workflow-and-p...

by DevEx7

0 subcomment

I’m a big fan of having the model create a GitHub issue directly (using the GH CLI) with the exact plan it generates, instead of creating a markdown file that will eventually get deleted. It gives me a permanent record and makes it easy to reference and close the issue once the PR is ready.

by armanj

0 subcomment

> “remove this section entirely, we don’t need caching here” — rejecting a proposed approach
I wonder why you don't remove it yourself. Aren't you already editing the plan?

by amarant

0 subcomment

Interesting! I feel like I'm learning to code all over again! I've only been using Claude for a little more than a month and until now I've been figuring things out on my own. Building my methodology from scratch. This is much more advanced than what I'm doing. I've been going straight to implementation, but doing one very small and limited feature at a time, describing implementation details (data structures like this, use that API here, import this library etc) verifying it manually, and having Claude fix things I don't like. I had just started getting annoyed that it would make the same (or very similar) mistake over and over again and I would have to fix it every time. This seems like it'll solve that problem I had only just identified! Neat!

by throwaway7783

0 subcomment

I have to give this a try. My current model for backend is the same as how author does frontend iteration. My friend does the research-plan-edit-implement loop, and there is no real difference between the quality of what I do and what he does. But I do like this just for how it serves as documentation of the thought process across AI/human, and can be added to version control. Instead of humans reviewing PRs, perhaps humans can review the research/plan document.
On the PR review front, I give Claude the ticket number and the branch (or PR) and ask it to review for correctness, bugs and design consistency. The prompt is always roughly the same for every PR. It does a very good job there too.
Modelwise, Opus 4.6 is scary good!

by Centigonal

0 subcomment

The idea of having the model create a plan/spec, which you then mark up with comments before execution, is a cornerstone of how the new generation of AI IDEs like Google Antigravity operate.
Claude Code also has "Planning Mode" which will do this, but in my experience its "plan" sometimes includes the full source code of several files, which kind of defeats the purpose.

by mukundesh

0 subcomment

https://github.blog/ai-and-ml/generative-ai/spec-driven-deve...

by d1sxeyes

0 subcomment

The “inline comments on a plan” is one of the best features of Antigravity, and I’m surprised others haven’t started copycatting.

by turingsroot

0 subcomment

I've been running AI coding workshops for engineers transitioning from traditional development, and the research phase is consistently the part people skip — and the part that makes or breaks everything.
The failure mode the author describes (implementations that work in isolation but break the surrounding system) is exactly what I see in workshop after workshop. Engineers prompt the LLM with "add pagination to the list endpoint" and get working code that ignores the existing query builder patterns, duplicates filtering logic, or misses the caching layer entirely.
What I tell people: the research.md isn't busywork, it's your verification that the LLM actually understands the system it's about to modify. If you can't confirm the research is accurate, you have no business trusting the plan.
One thing I'd add to the author's workflow: I've found it helpful to have the LLM explicitly list what it does NOT know or is uncertain about after the research phase. This surfaces blind spots before they become bugs buried three abstraction layers deep.

by kulikalov

0 subcomment

I came to the exact same pattern, with one extra heuristic at the end: spin up a new claude instance after the implementation is complete and ask it to find discrepancies between the plan and the implementation.

by nerdright

1 subcomments

Haha this is surprisingly and exactly how I use claude as well. Quite fascinating that we independently discovered the same workflow.
I maintain two directories: "docs/proposals" (for the research md files) and "docs/plans" (for the planning md files). For complex research files, I typically break them down into multiple planning md files so claude can implement one at a time.
A small difference in my workflow is that I use subagents during implementation to avoid context from filling up quickly.

by georgecalm

0 subcomment

> This is the most expensive failure mode with AI-assisted coding, and it’s not wrong syntax or bad logic. It’s implementations that work in isolation but break the surrounding system.
This is spot on. Zooming out, a perfectly written implementation that follows all the conventions but misses the mark on its business goal is as, if not more, expensive. I think adding a brief.md artifact to the beginning of the flow (where you store the problem, desired change, primary metric, feature-kill criteria) can go a long way.

by __mharrison__

0 subcomment

This is very similar to the RECR (requirements, execute, check, repeat) framework I use and teach to my clients.
One critical step that I didn't see mentioned is testing. I drive my agents with TDD and it seems to make a huge difference.

by josefrichter

0 subcomment

Radically different? Sounds to me like the standard spec driven approach that plenty of people use.
I prefer iterative approach. LLMs give you incredible speed to try different approaches and inform your decisions. I don’t think you can ever have a perfect spec upfront, at least that’s my experience.

by efnx

0 subcomment

I’ve been using Claude through opencode, and I figured this was just how it does it. I figured everyone else did it this way as well. I guess not!

by richardjennings

0 subcomment

This is similar to what I do. I instruct an Architect mode with a set of rules related to phased implementation and detailed code artifacts output to a report.md file. After a couple of rounds of review and usually some responses that either tie together behaviors across context, critique poor choices or correct assumptions, there is a piece of work defined for a coder LLM to perform. With the new Opus 4.6 I then select specialist agents to review the report.md, prompted with detailed insight into particular areas of the software. The feedback from these specialist agent reviews is often very good and sometimes catches things I had missed. Once all of this is done, I let the agent make the changes and move onto doing something else. I typically rename and commit the report.md files which can be useful as an alternative to git diff / commit messages etc.

by aabajian

1 subcomments

I'm going to offer a counterpoint suggestion. You need to watch Claude try to implement small features many times without planning to see where it is likely to fail. It will often do the same mistakes over and over (e.g. trying to SSH without opening a bastion, mangling special characters in bash shell, trying to communicate with a server that self-shuts down after 10 minutes). Once you have a sense for all the repeated failure points of your workflow, then you can add them to future plan files.

0 subcomment

by parasti

1 subcomments

The biggest roadblock to using agents to maximum effectiveness like this is the chat interface. It's convenience as detriment and convenience as distraction. I've found myself repeatedly giving into that convenience only to realize that I have wasted an hour and need to start over because the agent is just obliviously circling the solution that I thought was fully obvious from the context I gave it. Clearly these tools are exceptional at transforming inputs into outputs and, counterintuitively, not as exceptional when the inputs are constantly interleaved with the outputs like they are in chat mode.

by getnormality

1 subcomments

This looks like an important post. What makes it special is that it operationalizes Polya's classic problem-solving recipe for the age of AI-assisted coding.
1. Understand the problem (research.md)
2. Make a plan (plan.md)
3. Execute the plan
4. Look back

by foobarincaps

1 subcomments

I’ve begun using Gpt’y to iron out most of the planning phase to essentially bootstrap the conversation with Claude. I’m curious if others have done that.
Sometimes I find it quite difficult to form the right question. Using Gpt’y I can explore my question and often times end up asking a completely different question.
It also helps derisk hitting my usage limits with pro. I feel like I’m having richer conversations now w/ Claude but I also feel more confident in my prompts.

by jefecoon

0 subcomment

Here's my workflow, hopefully concise enough as a reply, in case helpful to those very few who'll actually see it:
Research -> Define 'Domains' -> BDD -> Domain Specs -> Overall Arch Specs / complete/consistent/gap analysis -> Spec Revision -> TDD Dev.
Smaller projects this is overkill. Larger projects, imho, gain considerable value from BDD and Overall Architecture Spec complete/consistent/gap analysis...
Cheers

by shevy-java

0 subcomment

I don't deny that AI has use cases, but boy - the workflow described is boring:
"Most developers type a prompt, sometimes use plan mode, fix the errors, repeat. "
Does anyone think this is as epic as, say, watch the Unix archives https://www.youtube.com/watch?v=tc4ROCJYbm0 where Brian demos how pipes work; or Dennis working on C and UNIX? Or even before those, the older machines?
I am not at all saying that AI tools are all useless, but there is no real epicness. It is just autogenerated AI slop and blob. I don't really call this engineering (although I also do agree, that it is engineering still; I just don't like using the same word here).
> never let Claude write code until you’ve reviewed and approved a written plan.
So the junior-dev analogy is quite apt here.
I tried to read the rest of the article, but I just got angrier. I never had that feeling watching oldschool legends, though perhaps some of their work may be boring, but this AI-generated code ... that's just some mythical random-guessing work. And none of that is "intelligent", even if it may appear to work, may work to some extent too. This is a simulation of intelligence. If it works very well, why would any software engineer still be required? Supervising would only be necessary if AI produces slop.

by irthomasthomas

1 subcomments

In my own tests I have found opus to be very good at writing plans, terrible at executing them. It typically ignores half of the constraints. https://x.com/xundecidability/status/2019794391338987906?s=2... https://x.com/xundecidability/status/2024210197959627048?s=2...

by Fuzzwah

0 subcomment

All sounds like a bespoke way of remaking https://github.com/Fission-AI/OpenSpec

by vemv

0 subcomment

Every "how I use Claude Code" post will get into the HN frontpage.
Which maybe has to do with people wanting to show how they use Claude Code in the comments!

by QuiEgo

0 subcomment

I find a spend most of my time defining interfaces and putting comments down now (“// this function does x”). Then I tell it “implement function foo, as described in the doc comment” or “implement all functions that are TODO”. It’s pretty good at filling in a skeleton you’ve laid out.

by prodtorok

0 subcomment

Insights are nice for new users but I’m not seeing anything too different from how anyone experienced with Claude Code would use plan mode. You can reject plans with feedback directly in the CLI.

0 subcomment

by jeleh

0 subcomment

Good article, but I would rephrase the core principle slightly:
Never let Claude write code until you’ve reviewed, *fully understood* and approved a written plan.
In my experience, the beginning of chaos is the point at which you trust that Claude has understood everything correctly and claims to present the very best solution. At that point, you leave the driver's seat.

by achenatx

0 subcomment

I use amazon kiro.
The AI first works with you to write requirements, then it produces a design, then a task list.
The helps the AI to make smaller chunks to work on, it will work on one task at a time.
I can let it run for an hour or more in this mode. Then there is lots of stuff to fix, but it is mostly correct.
Kiro also supports steering files, they are files that try to lock the AI in for common design decisions.
the price is that a lot of the context is used up with these files and kiro constantly pauses to reset the context.

by lastdong

0 subcomment

Google Anti-Gravity has this process built in. This is essentially a cycle a developer would follow: plan/analyse - document/discuss - break down tasks/implement. We’ve been using requirements and design documents as best practice since leaving our teenage bedroom lab for the professional world. I suppose this could be seen as our coding agents coming of age.

by pgt

1 subcomments

My process is similar, but I recently added a new "critique the plan" feedback loop that is yielding good results. Steps:
1. Spec
2. Plan
3. Read the plan & tell it to fix its bad ideas.
4. (NB) Critique the plan (loop) & write a detailed report
5. Update the plan
6. Review and check the plan
7. Implement plan
Detailed here:
https://x.com/PetrusTheron/status/2016887552163119225

by rotbart

0 subcomment

This is a similar workflow to speckit, kiro, gsd, etc.

by imron

1 subcomments

I have tried using this and other workflows for a long time and had never been able to get them to work (see chat history for details).
This has changed in the last week, for 3 reasons:
1. Claude opus. It’s the first model where I haven’t had to spend more time correcting things than it would’ve taken me to just do it myself. The problem is that opus chews through tokens, which led to..
2. I upgraded my Claude plan. Previously on the regular plan I’d get about 20 mins of time before running out of tokens for the session and then needing to wait a few hours to use again. It was fine for little scripts or toy apps but not feasible for the regular dev work I do. So I upgraded to 5x. This now got me 1-2 hours per session before tokens expired. Which was better but still a frustration. Wincing at the price, I upgraded again to the 20x plan and this was the next game changer. I had plenty of spare tokens per session and at that price it felt like they were being wasted - so I ramped up my usage. Following a similar process as OP but with a plans directory with subdirectories for backlog, active and complete plans, and skills with strict rules for planning, implementing and completing plans, I now have 5-6 projects on the go. While I’m planning a feature on one the others are implementing. The strict plans and controls keep them on track and I have follow up skills for auditing quality and performance. I still haven’t hit token limits for a session but I’ve almost hit my token limit for the week so I feel like I’m getting my money’s worth. In that sense spending more has forced me to figure out how to use more.
3. The final piece of the puzzle is using opencode over claude code. I’m not sure why but I just don’t gel with Claude code. Maybe it’s all the sautéing and flibertygibbering, maybe it’s all the permission asking, maybe it’s that it doesn’t show what it’s doing as much as opencode. Whatever it is it just doesn’t work well for me. Opencode on the other hand is great. It’s shows what it’s doing and how it’s thinking which makes it easy for me to spot when it’s going off track and correct early.
Having a detailed plan, and correcting and iterating on the plan is essential. Making clause follow the plan is also essential - but there’s a line. Too fine grained and it’s not as creative at solving problems. Too loose/high level and it makes bad choices and goes in the wrong direction.
Is it actually making me more productive? I think it is but I’m only a week in. I’ve decided to give myself a month to see how it all works out.
I don’t intend to keep paying for the 20x plan unless I can see a path to using it to earn me at least as much back.

by mkl

0 subcomment

How are the annotations put into the markdown? Claude needs to be able to identify them as annotations and not parts of the plan.

by Ozzie_osman

0 subcomment

There are a few prompt frameworks that essentially codify these types of workflows by adding skills and prompts
https://github.com/obra/superpowers https://github.com/jlevy/tbd

0 subcomment

by clbrmbr

2 subcomments

I just use Jesse’s “superpowers” plugin. It does all of this but also steps you through the design and gives you bite sized chunks and you make architecture decisions along the way. Far better than making big changes to an already established plan.

by dnautics

0 subcomment

this is literally reinventing claude's planning mode, but with more steps. I think Boris doesn't realize that planning mode is actually stored in a file.
https://x.com/boristane/status/2021628652136673282

by xbmcuser

0 subcomment

Gemini is better at research Claude at coding. I try to use Gemini to do all the research and write out instruction on what to do what process to follow then use it in Claude. Though I am mostly creating small python scripts

by nesk_

0 subcomment

> I am not seeing the performance degradation everyone talks about after 50% context window.
I pretty much agree with that. I use long sessions and stopped trying to optimize the context size, the compaction happens but the plan keeps the details and it works for me.

by zuInnp

0 subcomment

Since the rise of AI systems I really wonder how people wrote code before. This is exactly how I planned out implementation and executed the plan. Might have been some paper notes, a ticket or a white board, buuuuut ... I don't know.

by gabrieledarrigo

1 subcomments

Does anyone still write code? I use agents to iterate on one task in parallel, with an approach similar to this one: https://mitchellh.com/writing/my-ai-adoption-journey#today
But I'm starting to have an identity crisis: am I doing it wrong, and should I use an agent to write any line of code of the product I'm working on?
Have I become a dinosaur in the blink of an eye?
Should I just let it go and accept that the job I was used to not only changed (which is fine), but now requires just driving the output of a machine, with no creative process at all?

by chickensong

1 subcomments

I agree with most of this, though I'm not sure it's radically different. I think most people who've been using CC in earnest for a while probably have a similar workflow? Prior to Claude 4 it was pretty much mandatory to define requirements and track implementation manually to manage context. It's still good, but since 4.5 release, it feels less important. CC basically works like this by default now, so unless you value the spec docs (still a good reference for Claude, but need to be maintained), you don't have to think too hard about it anymore.
The important thing is to have a conversation with Claude during the planning phase and don't just say "add this feature" and take what you get. Have a back and forth, ask questions about common patterns, best practices, performance implications, security requirements, project alignment, etc. This is a learning opportunity for you and Claude. When you think you're done, request a final review to analyze for gaps or areas of improvement. Claude will always find something, but starts to get into the weeds after a couple passes.
If you're greenfield and you have preferences about structure and style, you need to be explicit about that. Once the scaffolding is there, modern Claude will typically follow whatever examples it finds in the existing code base.
I'm not sure I agree with the "implement it all without stopping" approach and let auto-compact do its thing. I still see Claude get lazy when nearing compaction, though has gotten drastically better over the last year. Even so, I still think it's better to work in a tight loop on each stage of the implementation and preemptively compacting or restarting for the highest quality.
Not sure that the language is that important anymore either. Claude will explore existing codebase on its own at unknown resolution, but if you say "read the file" it works pretty well these days.
My suggestions to enhance this workflow:
- If you use a numbered phase/stage/task approach with checkboxes, it makes it easy to stop/resume as-needed, and discuss particular sections. Each phase should be working/testable software.
- Define a clear numbered list workflow in CLAUDE.md that loops on each task (run checks, fix issues, provide summary, etc).
- Use hooks to ensure the loop is followed.
- Update spec docs at the end of the cycle if you're keeping them. It's not uncommon for there to be some divergence during implementation and testing.

by neuronexmachina

0 subcomment

My flow is pretty similar, except I also add in these steps at the end of planning:
* Review the plan for potential issues
* Add context to the plan that would be helpful for an implementing agent

by cheekyant

0 subcomment

It seems like the annotation of plan files is the key step.
Claude Code now creates persistent markdown plan files in ~/.claude/plans/ and you can open them with Ctrl-G to annotate them in your default editor.
So plan mode is not ephemeral any more.

by rossant

0 subcomment

Funny how I came up with something loosely similar. Asking Codex to write a detailed plan in a markdown document, reviewing it, and asking it to implement it step by step. It works exquisitely well when it can build and test itself.

by islandfox100

2 subcomments

It strikes me that if this technology were as useful and all-encompassing as it's marketed to be, we wouldn't need four articles like this every week

by gehsty

0 subcomment

Doesn’t Claude code do this by switching between edit mode and plan mode?
FWIW I have had significant improvements by clearing context then implementing the plan. Seems like it stops Claude getting hung up on something.

by smcleod

0 subcomment

I don't really get what is different about this from how almost everyone else uses Claude Code? This is an incredibly common, if not the most common way of using it (and many other tools).

by _hugerobots_

0 subcomment

Hub and spoke documentation in planning has been absolutely essential for the way my planning was before, and it's pretty cool seeing it work so well for planning mode to build scaffolds and routing.

by skybrian

0 subcomment

I do something broadly similar. I ask for a design doc that contains an embedded todo list, broken down into phases. Looping on the design doc asking for suggestions seems to help. I'm up to about 40 design docs so far on my current project.

by strix_varius

0 subcomment

The baffling part of the article is all the assertions about how this is unique, novel, not the typical way people are doing this etc.
There are whole products wrapped around this common workflow already (like Augment Intent).

by jamesmcq

10 subcomments

This all looks fine for someone who can't code, but for anyone with even a moderate amount of experience as a developer all this planning and checking and prompting and orchestrating is far more work than just writing the code yourself.
There's no winner for "least amount of code written regardless of productivity outcomes.", except for maybe Anthropic's bank account.

by connectsnk

0 subcomment

Is it required to tell Claude to re-read the code folder again when you come back some day later or should we ask Claude to just pickup from research.md file thus saving some tokens?

by gregman1

0 subcomment

It is really fun to watch how a baby makes its first steps and also how experienced professionals rediscover what standards were telling us for 80+ years.

by notjes

0 subcomment

Holy moly, I just applied the principles to DND campaign creation and I am in awe.

by jrs235

0 subcomment

Claude appeared to just crash in my session: https://news.ycombinator.com/item?id=47107630

by __bjoernd

0 subcomment

Sounds a bit like what Claude Plan Mode or Amazon's Kiro were built for. I agree it's a useful flow, but you can also overdo it.

by podgorniy

0 subcomment

I do the same. I also cross-ask gemini and claude about the plan during iterations, sometimes make several separate plans.

by folex

0 subcomment

this is exactly how I work with cursor
except that I put notes to plan document in a single message like:
```
   > plan quote
   my note
   > plan quote
   my note
```
otherwise, I'm not sure how to guarantee that ai won't confuse my notes with its own plan.
one new thing for me is to review the todo list, I was always relying on auto generated todo list

by TrailingArbutus

0 subcomment

LLMs hallucinations on macro isn't about planning and not planning like sparin9 pointed out. It's like, an architectural problem which would be fun to fix using overseeing system?

by stuaxo

0 subcomment

I had to stop reading about half way, it's written in that breathless linkedin/ai generated style.

by dr_kretyn

0 subcomment

The post and comments all read like: Here are my rituals to the software God. If you follow them then God gives plenty. Omit one step and the God mad. Sometimes you have to make a sacrifice but that's better for the long term.
I've been in eng for decades but never participated in forums. Is the cargo cult new?
I use Claude Code a lot. Still don't trust what's in the plan will get actually written, regardless of details. My ritual is around stronger guardrails outside of prompting. This is the new MongoDB webscale meme.

by w4yai

0 subcomment

You described how AntiGravity works natively.

by RVuRnvbM2e

1 subcomments

This is just Waterfall for LLMs. What happens when you explore the problem space and need to change up the plan?

by zhubert

0 subcomment

AI only improves and changes. Embrace the scientific method and make sure your “here’s how to” are based in data.

by willsmith72

0 subcomment

this sounds... really slow. for large changes for sure i'm investing time into planning. but such a rigid system can't possible be as good as a flexible approach with variable amounts of planning based on complexity

by growt

0 subcomment

That is just spec driven development without a spec, starting with the plan step instead.

by dworks

0 subcomment

my rlm-workflow skill has this encoded as a repeatable workflow.
give it a try: https://skills.sh/doubleuuser/rlm-workflow/rlm-workflow

by grabshot_dev

0 subcomment

Why don't you make Claude give feedback and iterate by itself?

by beratbozkurt0

0 subcomment

That's great, actually, doesn't the logic apply to other services as well?

by riknos314

0 subcomment

Sounds similar to Kiro's specs.

by des429

0 subcomment

The author discovered plan mode in cursor.

by h14h

0 subcomment

Is this not just Ralph with extra steps and the risk of context rot?

by bandrami

0 subcomment

How much time are you actually saving at this point?

by bodeadly

0 subcomment

Tip: LLMs are very good at following conventions (this is actually what is happening when it writes code). If you create a .md file with a list of entries of the following structure: # <identifier> <description block> <blank space> # <identifier> ... where an <identifier> is a stable and concise sequence of tokens that identifies some "thing" and seed it with 5 entries describing abstract stuff, the LLM will latch on and reference this. I call this a PCL (Project Concept List). I just tell it: > consume tmp/pcl-init.md pcl.md The pcl-init.md describes what PCL is and pcl.md is the actual list. I have pcl.md file for each independent component in the code (logging, http, auth, etc). This works very very well. The LLM seems to "know" what you're talking about. You can ask questions and give instructions like "add a PCL entry about this". It will ask if should add a PCL entry about xyz. If the description block tends to be high information-to-token ratio, it will follow that convention (which is a very good convention BTW).
However, there is a caveat. LLMs resist ambiguity about authority. So the "PCL" or whatever you want to call it, needs to be the ONE authoritative place for everything. If you have the same stuff in 3 different files, it won't work nearly as well.
Bonus Tip: I find long prompt input with example code fragments and thoughtful descriptions work best at getting an LLM to produce good output. But there will always be holes (resource leaks, vulnerabilities, concurrency flaws, etc). So then I update my original prompt input (keep it in a separate file PROMPT.txt as a scratch pad) to add context about those things maybe asking questions along the way to figure out how to fix the holes. Then I /rewind back to the prompt and re-enter the updated prompt. This feedback loop advances the conversation without expending tokens.

by baalimago

0 subcomment

Another approach is to spec functionality using comments and interfaces, then tell the LLM to first implement tests and finally make the tests pass. This way you also get regression safety and can inspect that it works as it should via the tests.

by MagicMoonlight

0 subcomment

So we’re back to waterfall huh

by mcv

0 subcomment

This is great. My workflow is also heading in that direction, so this is a great roadmap. I've already learned that just naively telling Claude what to do and letting it work, is a recipe for disaster and wasted time.
I'm not this structured yet, but I often start with having it analyse and explain a piece of code, so I can correct it before we move on. I also often switch to an LLM that's separate from my IDE because it tends to get confused by sprawling context.

by recroad

0 subcomment

Use OpenSpec and simplify everything.

by vazma

0 subcomment

Sorry but I didn't get the hype with this post, isnt it what most of the people doing? I want to see more posts on how you use the claude "smart" without feeding the whole codebase polluting the context window and also more best practices on cost efficient ways to use it, this workflow is clearly burning million tokens per session, for me is a No

by dabedee

3 subcomments

I appreciate the author taking the time to share his workflow even though I really dislike the way this article is written. My dislike stems from sentences like this one: "I’ve been using Claude Code as my primary development tool for approx 9 months, and the workflow I’ve settled into is radically different from what most people do with AI coding tools." There is nothing radically different in the way he's using it (quite the opposite) and the are so many people that wrote about their workflows (and which are almost exactly the same, here's just one example [1]). Apart from that, the obvious use of AI to write or edit the article makes it further indigestible: "That’s it. No magic prompts, no elaborate system instructions, no clever hacks. Just a disciplined pipeline that separates thinking from typing."
[1] https://github.com/snarktank/ai-dev-tasks

by pajamasam

0 subcomment

I feel like if I have to do all this, I might as well write the code myself.

by gck1

0 subcomment

Very similar to what I'm doing, but I take a more unsupervised approach, then introduce as much determinism as possible in the right layers, while cutting out as much magic as possible.
One thing these sort of loops gloss over though, is that _plans change_ during implementation and I've never seen anyone talking about how they address this.
With pre-generated detailed implementation plan for an entire feature or a subsystem, the model will try to stick to the plan when reality has changed during implementation. So I would advice against a detailed plan. Plan and tasks within it are only valid for a single task for a feature. When codebase, i.e. reality changes - plan has to be regenerated.
The only thing that is immutable and untouchable during implementation is a high level spec.md file that lists goals and non-goals. I spend 1-2 hours specing a subsystem together in an interactive session with Opus, have it write a spec.md file, then a simple ralph loop. So it ends up:
1. Spec.md - interactive, extremely human-in-the-loop, high level objectives with acceptance criteria
Loop starts: 2. Plan - opus is instructed to read all specs using subagent to get general understanding. Then read the spec that we're currently working on. It will also read codebase to see what's the current state, what are the last changes in git, and what does log.md for the spec contain. Then it has to read the previous plan.md file and REWRITE it entirely, putting the most important next task at the top. It will have to log what it has done in specname_log.md (just a few lines max, no more)
3. Build: this is basically instructed to pick the most important thing from the plan and build it. All tests, lints etc must succeed before it can commit. It also logs to specname_log.md
This can loop for as long as both plan/builder iterations don't say that there's nothing to be done. And when that happens (usually after hours or days), reviewer agent steps in to do a more thorough review that everything listed in spec is done.
I also maintain a directives.md file in spec dir. I always review the TUI every 30 minutes or so to check if its getting stuck somewhere or taking path I don't like. I then put a single line in directives.md: "- X is the wrong approach and must be dropped"
All agents in their prompt have a line that says: "read directives.md - it contains human overrides that must be followed and they override everything".
This works extremely well. But the biggest downside is that you'll hit weekly Max 20x limits in 3 days.
I also advice ignoring any tooling that hides these things from you. All of this is a 300 line bash script and a few template prompt.md files that I built in a few minutes with opus, based on what I observed were the pain points. You need to be able to tweak your system in a few minutes down to the minute details. Using things like GSD, gastown, spec-kit, openclaw, etc locks you into the paradigms of people who also, don't know what we're all doing and what approach will win. In a few years, it's possible something will emerge that we all universally adopt, but right now, nobody has an idea what works.

by renewiltord

0 subcomment

The plan document and todo are an artifact of context size limits. I use them too because it allows using /reset and then continuing.

by yunusabd

0 subcomment

That's exactly what Cursor's "plan" mode does? It even creates md files, which seems to be the main "thing" the author discovered. Along with some cargo cult science?
How is this noteworthy other than to spark a discussion on hn? I mean I get it, but a little more substance would be nice.

by cawksuwcka

0 subcomment

falling asleep here. when will the babysitting end

by kissgyorgy

0 subcomment

There is not a lot of explanation WHY is this better than doing the opposite: start coding and see how it goes and how this would apply to Codex models.
I do exactly the same, I even developed my own workflows wit Pi agent, which works really well. Here is the reason:
- Claude needs a lot more steering than other models, it's too eager to do stuff and does stupid things and write terrible code without feedback.
- Claude is very good at following the plan, you can even use a much cheaper model if you have a good plan. For example I list every single file which needs edits with a short explanation.
- At the end of the plan, I have a clear picture in my head how the feature will exactly look like and I can be pretty sure the end result will be good enough (given that the model is good at following the plan).
A lot of things don't need planning at all. Simple fixes, refactoring, simple scripts, packaging, etc. Just keep it simple.

by lxe

0 subcomment

Honestly, I found that the best way to use these CLIs is exactly how the CLI creators have intended.

by oulipo2

0 subcomment

Has Claude Code become slow, laggy, imprecise, giving wrong answers for other people here?

by drcongo

0 subcomment

This is exactly how I use it.

by submeta

0 subcomment

What works extremely well for me is this: Let Claude Code create the plan, then turn over the plan to Codex for review, and give the response back to Claude Code. Codex is exceptionally good at doing high level reviews and keeping an eye on the details. It will find very suble errors and omissins. And CC is very good at quickly converting the plan into code.
This back and forth between the two agents with me steering the conversation elevates Claude Code into next level.

by chaboud

16 subcomments

The author seems to think they've hit upon something revolutionary...
They've actually hit upon something that several of us have evolved to naturally.
LLM's are like unreliable interns with boundless energy. They make silly mistakes, wander into annoying structural traps, and have to be unwound if left to their own devices. It's like the genie that almost pathologically misinterprets your wishes.
So, how do you solve that? Exactly how an experienced lead or software manager does: you have systems write it down before executing, explain things back to you, and ground all of their thinking in the code and documentation, avoiding making assumptions about code after superficial review.
When it was early ChatGPT, this meant function-level thinking and clearly described jobs. When it was Cline it meant cline rules files that forced writing architecture.md files and vibe-code.log histories, demanding grounding in research and code reading.
Maybe nine months ago, another engineer said two things to me, less than a day apart:
- "I don't understand why your clinerules file is so large. You have the LLM jumping through so many hoops and doing so much extra work. It's crazy."
- The next morning: "It's basically like a lottery. I can't get the LLM to generate what I want reliably. I just have to settle for whatever it comes up with and then try again."
These systems have to deal with minimal context, ambiguous guidance, and extreme isolation. Operate with a little empathy for the energetic interns, and they'll uncork levels of output worth fighting for. We're Software Managers now. For some of us, that's working out great.

by YetAnotherNick

0 subcomment

I don't know. I tried various methods. And this one kind of doesn't work quite a bit of times. The problem is plan naturally always skips some important details, or assumes some library function, but is taken as instruction in the next section. And claude can't handle ambiguity if the instruction is very detailed(e.g. if plan asks to use a certain library even if it is a bad fit claude won't know that decision is flexible). If the instruction is less detailed, I saw claude is willing to try multiple things and if it keeps failing doesn't fear in reverting almost everything.
In my experience, the best scenario is that instruction and plan should be human written, and be detailed.

by tayo42

0 subcomment

We're just slowly reinventing agile for telling Ai agents what to do lol
Just skip to the Ai stand-ups

0 subcomment

by politician

0 subcomment

Wow, I never bother with using phrases like “deeply study this codebase deeply.” I consistently get pretty fantastic results.

by fnord77

0 subcomment

I have a different approach where I have claude write coding prompts for stages then I give the prompt to another agent. I wonder if I should write it up as a blog post

by dr_dshiv

0 subcomment

Another pattern is:
1. First vibecode software to figure out what you want
2. Then throw it out and engineer it

by geoffbp

2 subcomments

It’s worrying to me that nobody really knows how LLMs work. We create prompts with or without certain words and hope it works. That’s my perspective anyway

by vibeprofessor

0 subcomment

add another agent review, I ask Claude to send plan for review to Codex and fix critical and high issues, with complexity gating (no overcomplicated logic), run in a loop, then send to Gemini reviewer, then maybe final pass with Claude, once all C+H pass the sequence is done

by wangzhongwang

0 subcomment

[dead]

by indiekitai

0 subcomment

[dead]

by buildoak

0 subcomment

[dead]

by ihsw

0 subcomment

Kiro's spec-based development looks identical.
https://kiro.dev/docs/specs/
It looks verbose but it defines the requirements based on your input, and when you approve it then it defines a design, and (again) when you approve it then it defines an implementation plan (a series of tasks.)

by MarcLore

0 subcomment

[dead]

0 subcomment

by aufklarer

0 subcomment

[dead]

by paperclipmaxi

0 subcomment

[dead]

by hobrokang

0 subcomment

[dead]

by ZeelRajodiya

0 subcomment

[dead]

by snigsnog

0 subcomment

[dead]

by straydusk

0 subcomment

This got upvotes? Literally just restating basics.

by hilliardfarmer

1 subcomments

[flagged]

by alexmorgan26

1 subcomments

[dead]

by bluegatty

1 subcomments

I don't see how this is 'radically different' given that Claude Code literally has a planning mode.
This is my workflow as well, with the big caveat that 80% of 'work' doesn't require substantive planning, we're making relatively straight forward changes.
Edit: there is nothing fundamentally different about 'annotating offline' in an MD vs in the CLI and iterating until the plan is clear. It's a UI choice.
Spec Driven Coding with AI is very well established, so working from a plan, or spec (they can be somewhat different) is not novel.
This is conventional CC use.

by ramoz

1 subcomments

One thing for me has been the ability to iterate over plans - with a better visual of them as well as ability to annotate feedback about the plan.
https://github.com/backnotprop/plannotator Plannotator does this really effectively and natively through hooks