FRESH

Hacker News

Home

Agentic Engineering Patterns

540 points by r4um

by lbreakjai

16 subcomments

We're going to do it again, aren't we? We're going to take something simple and sensible ("write tests first", "small composable modules", etc.), give it a fancy complicated name ("Behavior-Constrained Implementation Lifecycle pattern", "Boundary-Scoped Processing Constructs pattern", etc.), and create an entire industry of consultants and experts selling books and enterprise coaching around it, each swearing they have the secret sauce and the right incantations.
The damn thing _talks_. You can just _speak_ to it. You can just ask it to do what you want.

by slaye

14 subcomments

Simon, if you're reading this, I'd be really curious to hear your thoughts on how to effectively conduct code reviews in a world where "code is cheap".
One of the biggest struggles I have on my team is coworkers straight up vibing parts of the code and not understanding or guiding the architecture of subsystems. Or at least, not writing code in a way that is meant to be understood by others.
Then when I go through the code and provide extensive feedback (mostly architectural and highlighting odd inconsistencies with the code additions) I'm met with much pushback because "it works, why change it"? Not to mention the sheer size of prs ballooning in recent months.
The end result is me being the bottleneck because I can't keep up with the "pace" of code being generated, and feeling a lot of discomfort and pressure to lower my standards.
I've thought about using a code review agent to review and act as me in proxy, but not being able to control the exact output worries me. And I don't like the lack of human touch it provides. Maybe someone has advice on a humane way to handle this problem.

by mohsen1

5 subcomments

I've experimented with agentic coding/engineering a lot recently. My observation is that software that is easily tested are perfect for this sort of agentic loop.
In one of my experiments I had the simple goal of "making Linux binaries smaller to download using better compression" [1]. Compression is perfect for this. Easily validated (binary -> compress -> decompress -> binary) so each iteration should make a dent otherwise the attempt is thrown out.
Lessons I learned from my attempts:
- Do not micro-manage. AI is probably good at coming up with ideas and does not need your input too much
- Test harness is everything, if you don't have a way of validating the work, the loop will go stray
- Let the iterations experiment. Let AI explore ideas and break things in its experiment. The iteration might take longer but those experiments are valuable for the next iteration
- Keep some .md files as scratch pad in between sessions so each iteration in the loop can learn from previous experiments and attempts
[1] https://github.com/mohsen1/fesh

by benrutter

8 subcomments

I use AI in my workflow mostly for simple boilerplate, or to troubleshoot issues/docs.
I've dipped into agentic work now and again, but never been very impressed with the output (well, that there is any functioning output is insanely impressive, but it isn't code I want to be on the hook for complaining).
I hear a lot of people saying the same, but similarly a bunch of people I respect saying they barely write code anymore. It feels a little tricky to square these up sometimes.
Anyway, really looking forward to trying some if these patterns as the book develops to see if that makes a difference. Understanding how other peopke really use these tools is a big gap for me.

by storus

2 subcomments

These lessons get obliterated with every new LLM generation. Like how LangChain started on stupid models with small context, creating some crazy architecture around it to bypass their limitations that got completely obliterated when GPT-3.5 was released, yet people still use it and overcomplicate things. Rather look at where the puck is going, we might soon not need more than a single agent to do everything given context size keeps increasing, agent can use more tools and we might get some in-call context cleanup at some point as well that would allow an agent to spin forever instead of calling subagents due to context size limitations.

by jihadjihad

5 subcomments

I wish there was a little more color in the Testing and QA section. While I agree with this:
```
  > A comprehensive test suite is by far the most effective way to keep those features working.
```
there is no mention at all about LLMs' tendency to write tautological tests--tests that pass because they are defined to pass. Or, tests that are not at all relevant or useful, and are ultimately noise in the codebase wasting cycles on every CI run. Sometimes to pass the tests the model might even hardcode a value in a unit test itself!
IMO this section is a great place to show how we as humans can guide the LLM toward a rigorous test suite, rather than one that has a lot of "coverage" but doesn't actually provide sound guarantees about behavior.

by jkhdigital

0 subcomment

Today I gave a lecture to my undergraduate data structures students about the evolution of CPU and GPU architectures since the late 1970s. The main themes:
- Through the last two decades of the 20th century, Moore’s Law held and ensured that more transistors could be packed into next year’s chips that could run at faster and faster clock speeds. Software floated on a rising tide of hardware performance so writing fast code wasn’t always worth the effort.
- Power consumption doesn’t vary with transistor density but varies with the cube of clock frequency, so by the early 2000s Intel hit a wall and couldn’t push the clock above ~4GHz with normal heat dissipation methods. Multi-core processors were the only way to keep the performance increasing year after year.
- Up to this point the CPU could squeeze out performance increases by parallelizing sequential code through clever scheduling tricks (and compilers could provide an assist by unrolling loops) but with multiple cores software developers could no longer pretend that concurrent programming was only something that academics and HPC clusters cared about.
CS curricula are mostly still stuck in the early 2000s, or at least it feels that way. We teach big-O and use it to show that mergesort or quicksort will beat the pants off of bubble sort, but topics like Amdahl’s Law are buried in an upper-level elective when in fact it is much more directly relevant to the performance of real code, on real present-day workloads, than a typical big-O analysis.
In any case, I used all this as justification for teaching bitonic sort to 2nd and 3rd year undergrads.
My point here is that Simon’s assertion that “code is cheap” feels a lot like the kind of paradigm shift that comes from realizing that in a world with easily accessible massively parallel compute hardware, the things that matter for writing performant software have completely shifted: minimizing branching and data dependencies produces code that looks profoundly different than what most developers are used to. e.g. running 5 linear passes over a column might actually be faster than a single merged pass if those 5 passes touch different memory and the merged pass has to wait to shuffle all that data in and out of the cache because it doesn’t fit.
What all this means for the software development process I can’t say, but the payoff will be tremendous (10-100x, just like with properly parallelized code) for those who can see the new paradigm first and exploit it.

by maciusr

3 subcomments

There's a recurring theme in these agentic engineering threads that is worth calling out: the lessons, are almost always stated as universal – but are deeply dependent on team size, code base maturity, test coverage, and risk tolerance. What gets presented as a “win” for a well instrumented backend service could easily guide those working on UI-heavy or old code down the wrong path. The art of this might be less about discovering the correct pattern, and more about truthfully declaring when a pattern applies.

by sd9

1 subcomments

I've recently got into red/greed TDD with claude code, and I have to agree that it seems like the right way to go.
As my projects were growing in complexity and scope, I found myself worrying that we were building things that would subtly break other parts of the application. Because of the limited context windows, it was clear that after a certain size, Claude kind of stops understanding how the work you're doing interacts with the rest of the system. Tests help protect against that.
Red/green TDD specifically ensures that the current work is quite focused on the thing that you're actually trying to accomplish, in that you can observe a concrete change in behaviour as a result of the change, with the added benefit of growing the test suite over time.
It's also easier than ever to create comprehensive integration test suites - my most valuable tests are tests that test entire user facing workflows with only UI elements, using a real backend.

by nishantjani10

1 subcomments

I primarily use AI for understanding codebases myself. My prompt is:
"deeply understand this codebase, clearly noting async/sync nature, entry points and external integration. Once understood prepare for follow up questions from me in a rapid fire pattern, your goal is to keep responses concise and always cite code snippets to ensure responses are factual and not hallucinated. With every response ask me if this particular piece of knowledge should be persistent into codebase.md"
Both the concise and structure nature (code snippets) help me gain knowledge of the entire codebase - as I progressively ask complex questions on the codebase.

by ukuina

2 subcomments

I find StrongDM's Dark Factory principles more immediately actionable (sorry, Simon!): https://factory.strongdm.ai/principles

by andresquez

0 subcomment

I see a lot of people complaining that every day there are 100 new frameworks for “agent teams”, prompting styles, workflows, and everyone insists theirs is the best for one reason or another. It reminds me a lot of early software engineering: every team had its own way of doing things, we experimented with tons of methodologies (waterfall, agile, etc.), and over time a few patterns became widely adopted (scrum, PM roles, architects, tickets, rituals). It feels like we’re in that same messy exploration phase right now.
And actually, these tools actually work, , because 99% of people still don’t really know how to prompt agents well and end up doing things like “pls fix this, it’s not working”.
One thing that worked well for us was going back to how a human team would approach it: write a product spec first (expected behavior, constraints, acceptance criteria, etc), use AI to refine that spec, and only then hand it to an opinionated flow of agents that reflect a human team to implement.

by gaigalas

0 subcomment

The most important thing you need to understand with working with agents for coding is that now you design a production line. And that has nothing to do (mostly) with designing or orchestrating agents.
Take a guitar, for example. You don't industrialize the manufacture of guitars by speeding up the same practices that artisans used to build them. You don't create machines that resemble individual artisans in their previous roles (like everyone seems to be trying to do with AI and software). You become Leo Fender, and you design a new kind of guitar that is made to be manufactured at another level of scale magnitude. You need to be Leo Fender though (not a talented guitarrist, but definitely a technical master).
To me, it sounds too early to describe patterns, since we haven't met the Ford/Fender/etc equivalent of this yet. I do appreciate the attempt though.

by shreddd24

0 subcomment

Absolutely great work. I have been mostly just thinking about what you are already practicing. I think your site will become an invaluable source for software engineers who want to responsibly apply AI in their development flow.
For a high level description of what this new way of engineering is about: https://substack.com/@shreddd/p-189554031

by bluemario

2 subcomments

The "human in the loop at key checkpoints" pattern has been the most practically useful for us. We found that giving the agent full autonomy end-to-end produces subtly broken code that passes tests but violates implicit invariants you never thought to write down. Short loops with a human sanity check at decision forks catches that class of failure early.
The thing I keep wrestling with is where exactly to place those checkpoints. Too frequent and you've just built a slow pair programmer. Too infrequent and you're doing expensive archaeology to figure out where it went sideways. We've landed on "before any irreversible action" as a useful heuristic, but that requires the agent to have some model of what's irreversible, which is its own can of worms.
Has anyone found a principled way to communicate implicit codebase conventions to an agent beyond just dumping a CLAUDE.md or similar file? We've tried encoding constraints as linter rules but that only catches surface stuff, not architectural intent.

by tacone

0 subcomment

The patterns in the article might be a starter, but there's so much more to cover:
agents role (Orchestrator, QA etc.), agents communication, thinking patterns, iteration patterns, feature folders, time-aware changelog tracking, prompt enforcing, real time steering.
We might really need a public Wiki for that (C2 [1] style)
[1]: https://wiki.c2.com/

by lunias

0 subcomment

In most cases, the model is non-deterministic and you have no direct control over the input parameters. At best you might get access to some abstraction of a subset of those parameters. I don't know of a coding model that offers direct access to the seed. I like to hear about how people are using agents, but it also feels a lot like someone sitting at a slot machine telling you that if you put your shoes on the opposite feet then you win more often.

by tr888

0 subcomment

For web apps, explictly asking the agent to build in sensible checkpoints and validate at the checkpoint using Playwright has been very successful for me so far. It prevents the agent from strating off course and struggling to find its way back. That and always using plan mode first, and reviewing the plan for evidence of sensible checkpoints. /opusplan to save tokens!

by winwang

0 subcomment

Linear walkthrough: I ask my agents to give me a numbered tree. Controlling tree size specifies granularity. Numbering means it's simple to refer to points for discussion.
Other things that I feel are useful:
- Very strict typing/static analysis
- Denying tool usage with a hook telling the agent why+what they should do (instead of simple denial, or dangerously accepting everything)
- Using different models for code review

by AlexCalderAI

0 subcomment

Great patterns here. I'd add one more critical layer that many miss: orchestration state management.
Running multiple agents concurrently (QA, content, conversions, distribution), we hit this exact wall - agents didn't know what other agents had done, creating duplicate work and missed context.
Solved it with a stupidly simple approach: 1. Single TODO.md with "DO NOW" (unblocked), "BLOCKED", "DONE" sections 2. Named output files per agent type (qa-status.md, scout-finds.md, etc) 3. active-tasks.md for crash recovery - breadcrumbs from interrupted runs 4. Daily memory logs with session IDs for searchability
The key: File-based state is deterministic. After a crash, the next agent reads identical input, same decision rules, same output structure. Zero state collision, zero "what was I thinking?"
Deployment: ~8 agents on cron. They wake, read files, work, write results, die. No persistent terminal. No coordination overhead.
This turned "5 terminal tabs with unmanageable logs" into "grep yesterday's log, see exactly what happened."
Patterns + implementation details: https://osolobo.com/first-ai-agent-guide/

by ben30

1 subcomments

I contribute to an open source spec based project management tool. I spend about a day back and forth iterating on a spec, using ai to refine the spec itself. Sometimes feeding it in and out of Claude/gemini telling each other where the feedback has come from. The spec is the value. Using the ai pm tool I break it down into n tasks and sub tasks and dependencies. I then trigger Claude in teams mode to accomplish the project. It can be left alone over night. I wake up in the morning with n prs merged.

by AlexCalderAI

0 subcomment

Solid patterns here. One thing I'd add from running Claude Code in production:
The "give it bash" pattern sounds scary until you realize the alternative is 47 intermediate tool calls that fail silently.
Letting the agent write and run scripts means the agent debugs when something breaks. The feedback loop tightens dramatically.
The trick is sandboxing + cost limits. Not preventing shell access.

by jpadkins

0 subcomment

My simple Agent loop for hobby game dev (In antigravity, but this also works well in Claude Code). 1) I write the prompt for the next feature / tweak / fix I want the model to work on 2) if large, check implementation plan 3) play test prior changes 4) repeat. By the time my play test is done, the next batch of changes are ready for commit.
Has anyone setup a smooth agent setup for game art assets generation? (AI models already do great for shaders and VFX, but I would really love to automate model + texture + animation pipeline)

by yieldcrv

3 subcomments

I dont currently have confidence in TDD
A broken test doesn’t make the agentic coding tool go “ooooh I made a bad assumption” any more than a type error or linter does
All a broken test does it prompt me to prompt back “fix tests”
I have no clue which one broke or why or what was missed, and it doesnt matter. Actual regressions are different and not dependent on these tests, and I follow along from type errors and LLM observability

by ontouchstart

1 subcomments

Hoarding is becoming an epidemic mental disease in the society of abundance. I don’t know what the solution would be.
https://simonwillison.net/guides/agentic-engineering-pattern...

by ryanthedev

0 subcomment

Ahh, I tend to find software based engineering skills and workflows as the agentic engineering patterns.
I distilled multiple software books into these flows and skills. With more books to come.
Here is an example https://github.com/ryanthedev/code-foundations

by simonw

0 subcomment

I just started a new chapter partly inspired by this comment thread - anti-patterns: things NOT to do.
So far I only have one: Inflicting unreviewed code on collaborators, aka dumping a thousand line PR without even making sure it works first https://simonwillison.net/guides/agentic-engineering-pattern...

by SurvivorForge

0 subcomment

The code review bottleneck point resonates a lot. When agents can generate PRs in minutes, the human review step becomes the critical bottleneck — and it doesn't scale with generation speed. The teams I've seen handle this best treat agent output like a junior dev's work: smaller atomic commits, mandatory test coverage as a gate, and explicit reviewer checklists focused on logic rather than syntax. The shift is from "does this look right" to "does this behave correctly under these conditions."

by Thews

1 subcomments

There was a mention of using agents to build projects into WASM. I've had the best luck telling it to use zig to compile to webassembly. It shortens the time to completion by a significant amount.

by wokwokwok

1 subcomments

I really like the idea of agent coding patterns. This feels like it could be expanded easily with more content though. Off the top of my head:
- tell the agent to write a plan, review the plan, tell the agent to implement the plan
- allow the agent to “self discover” the test harness (eg. “Validate this c compiler against gcc”)
- queue a bunch of tasks with // todo … and yolo “fix all the todo tasks”
- validate against a known output (“translate this to rust and ensure it emits the same byte or byte output as you go”)
- pick a suitable language for the task (“go is best for this task because I tried several languages and it did the best for this domain in go”)

by jcmontx

0 subcomment

It baffles me how skeptical people here are of AI-assisted programming. If you don't see productivity gains I feel you're in deep denial.
It's true that in my company we're not building rockets or defense systems, maybe you guys are and in those scenarios it's less useful. But for typical LoB and/or consumer-facing software, AI is crushing it. Where I used to need 3 devs, now I just need one (and the support team around it: PM, BA, QA, Designer). For my business, AI has been a game changer.

by shubhamintech

0 subcomment

The test harness point is spot on but there's a gap worth naming: the failure modes you write evals for aren't the ones that cause users to churn. Prod conversations have a whole category where the agent doesn't error, it just confidently goes sideways in a way nobody wrote a test for. The teams actually retaining users from AI products are reading conversations, not just dashboards.

by fzaninotto

1 subcomments

Is "Agentic Engineering" is the new name for "Agent Experience"? If so, and even though I love Simon's contributions, there are many other guides to making codebases more welcoming to agents...
Shameless plug: I wrote one. https://marmelab.com/blog/2026/01/21/agent-experience.html

by yoaviram

9 subcomments

Yesterday I wrote a post about exactly this. Software development, as the act of manually producing code, is dying. A new discipline is being born. It is much closer to proper engineering.
Like an engineer overseeing the construction of a bridge, the job is not to lay bricks. It is to ensure the structure does not collapse.
The marginal cost of code is collapsing. That single fact changes everything.
https://nonstructured.com/zen-of-ai-coding/

by Juminuvi

0 subcomment

Very much agree with the idea of red/green TDD and have seen really good results during agentic coding. I've found adding a linting step in between increases efficiency as well and fails a bit faster. So it becomes..
Test fail -> implement -> linter -> test pass
Another idea I've thought about using is docs driven development. So the instructions might look like..
Write doc for feat/bug > test fail > implement > lint > test pass

by noddingham

0 subcomment

I'd choose a different word for the title of Hoard Things You Know How to Do. Hoarding is the opposite of what we want to do but I get from reading the section you mean create a collection that you can draw upon. IMO "Share" is a much better word choice.

by chillfox

2 subcomments

Isn’t this pretty much how everyone uses agents?
Feels like it’s a lot of words to say what amounts to make the agent do the steps we know works well for building software.

by alansaber

0 subcomment

The best thing I read in this was "Hoard things you know how to do" > basically get an LLM to mutate an existing function you know is 1. well written and 2. works. If you have many such components you're still assembling code rapidly but using building blocks you actually understand in depth, rather than getting an LLM to shit out something verbose.

by lvl155

1 subcomments

People come up with the most insane workflow for agents. They complete about 80% of the work but that last 20% is basically equivalent to you doing the whole thing piece wise (with the help of AI). Except the latter gives you peace of mind.
I am still not sold on agentic coding. We’ll probably get there within the next couple of years.

by throwaway_20357

0 subcomment

I see where Simon is coming from with these patterns but I wonder where large software companies stand regarding their agentic engineering practices? Is Google creating in-house code using agents against its monorepo? Has Microsoft outsourced Windows source code advancements to a dark factory yet?

by luca-ctx

0 subcomment

> I don't let LLMs write text for my blog.
Thank you Simon and I'm sure you would quickly fall off from #1 blogger on HN if you did. I insist on this for myself as well.
Somehow we are all getting really good at detecting "written by AI" with primal intuition.

by sidcool

2 subcomments

PSA: This is sponsored by Augment code.

by dgunay

0 subcomment

A lot of this is just things that high-functioning human teams were already doing: automate testing, explain your PRs to guide reviewers, demoing work, not just throwing bad code over the wall during code review, etc.

by kubb

3 subcomments

Is there a market for this like OOP patterns that used to sell in the 90s?

0 subcomment

by aksjfp222

1 subcomments

I mainly work with documents as a white collar worker but have vibe coded a few bits.
The thing I keep coming back to is that it's all code. Almost all white collar professions have at least some key outputs in code. Whether you are a store manager filling out reports or a marketing firm or a teacher, there is so much code.
This means you can give claude code a branded document template, fill it out, include images etc. and uploaded to our cloud hosting.
With this same guidance and taste, I'm doing close to the work of 5 people.
Setup: Claude code with full API access to all my digital spaces + tmux running 3-5 tasks in parallel

by fud101

2 subcomments

Any word on patterns for security and deployment to prod?

by hsaliak

0 subcomment

I'd like to plug https://github.com/hsaliak/std_slop/blob/main/docs/mail_mode... my coding harness (std::slop)'s mail model (a poor name i admit). I believe this solves a fundamental problem of accummulating errors along with code in your project.
This brings the Linux Kernel style patch => discuss => merge by maintainer workflow to agents. You get bisect safe patches you 'review' and provide feedback and approve.
While a SKILL could mimic this, being built in allows me to place access control and 'gate' destructive actions so the LLM is forced to follow this workflow. Overall, this works really well for me. I am able to get bisect-safe patches, and then review / re-roll them until I get exactly what I want, then I merge them.
Sure this may be the path to software factories, but it scales 'enough' for medium size projects and I've been able to build in a way that I maintain strong understanding of the code that goes in.

by bhaktatejas922

1 subcomments

have loved simon wilson for a long long time and still do. These patterns are all out of date by at least a year - the best devs I know were using Claude 3.5 like this

by sdevonoes

1 subcomments

Is there anything about reviewing the generated code? Not by the author but by another human being.
Colleagues don’t usually like to review AI generated code. If they use AI to review code, then that misses the point of doing the review. If they do the review manually (the old way) it becomes a bottleneck (we are faster at producing code now than we are at reviewing it)

by vicchenai

0 subcomment

Honestly the best thing about Simon's writing is he just shows you what works instead of inventing a whole taxonomy. Half the "patterns" people talk about are just... writing good prompts and checking the output? Like we've been doing error handling and retries forever, now its an Agentic Pattern(tm).

by MickeyShmueli

0 subcomment

the tautological test problem someone mentioned, i've found the easiest fix is to literally make the test fail first before letting the agent fix it
like don't ask it to "write tests for this function", instead give it a function that's deliberately broken in a specific way, make it write a test that catches that bug, verify the test actually fails, THEN fix the function
this forces the test to be meaningful because it has to detect a real failure mode. if the agent can't make the test fail by breaking the code, the test is useless
the other thing that helps is being really specific about edge cases upfront. instead of "write tests for this API endpoint", say "write tests that verify it returns 400 when the email field is missing, returns 409 when the email already exists, returns 422 when the email is malformed" etc
agents are weirdly good at implementing specific test scenarios but terrible at figuring out what scenarios actually matter. which honestly is the same problem junior devs have lol

by pts_

0 subcomment

I really hate smelly statements like this or that is cheap now. They reek of carelessness.

by bhekanik

1 subcomments

[flagged]

by jamiemallers

0 subcomment

[dead]

by flashybaby

0 subcomment

[flagged]

by krasikra

0 subcomment

[dead]

0 subcomment

by calmtrace

0 subcomment

[dead]

by truejaian

0 subcomment

[dead]

by jakgru

0 subcomment

[dead]

by Ted719069

0 subcomment

[dead]

by dude250711

1 subcomments

Slop Engineering Patterns

by Madmallard

1 subcomments

patterns that may help increase subjective perception of reliability from non-deterministic text generators trained on the theft of millions of developer's work for the past 25 years.

0 subcomment