FRESH

Hacker News

Home

Backpressure is all you need

216 points by lucasfcosta

by _zoltan_

8 subcomments

"In this post, I’ll cover a third, not-so-obvious approach: building ways for the agent to validate more of its own work before a human has to step in. "
this has been an obvious thing to do since at least January (since Geoffrey Huntley published "everything is a ralph loop"), and this is how I've been working: build enough orchestration tooling to be able to automate everything: development container bringup, building it, running the unit tests, doing integration testing, and using the software as eventually an end user. then to iterate set performance goals on an already solid basis so the automated agent ("gym") can go and iterate autonomously, and let you know when it's "done".
I understand this probably does not work if you're on some subscription and not using the API (tokens burn fast), but this has been extremely productive for me.

by xg15

4 subcomments

Isn't this a bit of an incorrect usage of the term "backpressure"?
OP quoted the correct definition right at the start:
> In systems engineering, backpressure is the mechanism by which a downstream component signals upstream that it can't accept more work
(the "downstream component" being the human reviewer in this case)
But the measures they propose don't actually do that. They are more like fixed throttle elements which would slow down the rate of submissions of an agent and weed out some low-quality submissions before hitting "downstream".
I'm missing the connection to the actual capacity (or will) that the human developers have to review the submissions.

by pshirshov

5 subcomments

A very long post about a simple and very obvious idea with many different implementations.
The three main problems are 1) API usage is deadly expensive 2) Claude is about to make all automation very expensive 3) all the flows where a model has the initiative are strictly biased towards unwarranted stops (checkpointing).
Also, I won't call that "backpressure", there is no producer-consumer disbalance or something similar. From what I can see, the author just proposes a structured feedback loop. That's a discussion about organizational principles for system which consist of multiple unreliable but very complex components and this "backpressure" is just one of the aspects. Personally I find the viable system model framework productive as both a mental model and literal implementation guideline.
Lesser problem is that agent SDKs are bad and building a custom harness is hard.

by wellpast

8 subcomments

I’m willing to be wrong but this industry-wide emphasis on AI creative/coding workflows seems way over-engineered.
Ime successful creative execution looks like micro-iterations where each output informs the next creative move.
I can build something incredibly fast from essentially caveman grunt instructions through an LLM harness, iterating as I go.
Optimizing for feeding a huge plan to an agent sounds to me like a net waste of time. And looking over the shoulder of industry peers trying to do this, I don’t see their outputs or throughput some remarkable improvement over what I can produce with minimal fanfare usage.

by denysvitali

2 subcomments

This seems to be the coding agents 101: build a strong feedback loop. Am I missing something?

by jon-wood

1 subcomments

This what hooks[1] are for, except hooks allow specifying criteria in certain conditions (like the agent believing it’s done and ready to hand back to the user) in a manner that the agent won’t just forget about once it’s a few turns deep, and doesn’t require triggering a whole other LLM instance to read some plain text instructions while you hope it interprets them correctly.
It absolutely makes sense to have a system in place that allows the code generated by an LLM to be automatically validated but there’s no need to resort to a non-deterministic system for these sort of deterministic pass/fail conditions.
[1] https://code.claude.com/docs/en/hooks

by EMM_386

1 subcomments

I always use a standard workflow and it has never been a problem.
- Define the task and the goal, write a short spec document (markdown is fine)
- Point the agent at it in plan mode and have it write the plan to disk with phases. Iterate on its plan if necessary here and now.
- Have each agent tackle a phase and have it update it as a living document (switch models if some phases are more difficult than others)
- Clear and repeat until done
I've never had to overcomplicate this and it's worked both on enterprise-scale projects and personal projects. I am not sure what I'm missing - if anything.

by cadamsdotcom

2 subcomments

Everyone looking into this and other verification should be moving away from long prompts and complex skills, and looking into hooks.
If you put all these checks in your stop hook and your git commit hook, your repo docs can tell your agent that checks will run automatically when it stops work, and it should fix any problems found.
It’s wonderful to reintroduce determinism at the QA end of your process. I find it very calming to know the agent can’t skip or forget to check its work because with hooks the checks are run by the harness.

by vermilingua

1 subcomments

> It should also reduce the number of low-quality PRs your teammates have to review for details the agent should have caught itself.
Oh boy.

by mcint

0 subcomment

The overriding of click behavior is quite annoying. 30 years of browser user-agent behavior.
Next, Vercel, already handle this correctly. It takes special effort to violate "least surprise" here. Cmd-click on a link, should open it in a new tab.
It does appear to be an issue with SimpleAnalytics, now Adobe's,
```
    onclick="saAutomatedLink(this, 'outbound'); return false;"
```
Free debugging of how the site tweaks, breaks, the 30 year consensus web standard behavior.
Good sites, good blogs, *don't override onclick for links.* Or handle it correctly. I'll leave an issue on the github.
Between your footer, and dotfiles repo, OP does seem to appreciate standards & norms, in principle.

by hsaliak

0 subcomment

I have a custom agent that generates patches like you will with kernel development and I review and merge those in. https://github.com/hsaliak/std_slop/blob/main/docs/mail_mode...
My agent forces this workflow by disabling modifications outside the coding step.
I added looping to this not too long ago. https://github.com/hsaliak/std_slop/blob/main/docs/mail-loop...
This gives me the best of both worlds, hand curated reviews and automation. I often get the best quality if I do both, with an agent doing a pass first.

by tim-projects

0 subcomment

I'm building a tool that automates most of this. What the author didn't even touch on is just how much AI cheats.
The more guardrails you provide the more it cheats.
AI is like a wild animal that needs to do something, and it takes a fair bit of work to corner it. And only when it's cornered and at the point of giving up, can you then offer it a way out.
If you don't do what I said, I can guarantee it's fooling you somehow.

by mark_l_watson

0 subcomment

Interesting ideas for generalizing goals to reduce human labor in human <—> agent interactions. That said, maybe it is better to set up customized skills and infrastructure for large projects? At our early stage of trying to capture value of agentic systems, the good ideas in this article might be premature optimization.

by yearesadpeople

0 subcomment

If the systems invariants are well defined, and a suite of conformance + requirements tests (ensuring invariance is respected) are defined, wouldn't this be a broad - _'base case'_ - approach in general?

by socketcluster

0 subcomment

I've been advocating for this approach for years. It's useful for any kind of data processing. You can't avoid race conditions without using some kind of queueing mechanism and you need backpressure to measure queue capacity. I built this into every aspect of https://socketcluster.io/ - From pub/sub channels, RPCs to event listeners.

by try-working

0 subcomment

I built a recursive workflow that creates its own source of truth for verifying its work: https://github.com/try-works/recursive-mode

by xlii

0 subcomment

If that's third then I have fourth. Self plug obviously, but figured that I'd like something between smart autocomplete and an agent - an autocomplete that has wider context.
Called it rik, and it's on GitHub if anyone's interested checking it.
https://github.com/exlee/rik

by SkiFreeWin3

0 subcomment

Looks like plenty of recent prior art on this:
https://pura.xyz
https://github.com/puraxyz/puraxyz/blob/main/docs/paper/main...

by cyanydeez

0 subcomment

interesting idea, unfortunately programming the structure is equivalent (P=NP) to just programming itself. same as TDD.
as usual, the tool isnt really doing whats listed on its label.
however, people are different so this might improve someones capability to deploy LLMs. might even provide better evidence where actual brain power is needed.

by einpoklum

0 subcomment

In other words: Spending more tokens is all you need.
The main kind of pressure I'm feeling is the pressure of the giant AI, GPU & datacenter companies with their insane capital expenditure and circular deals, trying to get enough people to develop an expensive reliance on their service. And the more expensive, the better, so don't just pay for the LLM to code for you, have another LLM interact with the first LLM and pay double, treble, 5x or whatever. Then you can get the most refined slop.

by ahstilde

0 subcomment

my entire thesis behind oro, my personal coding harness: https://github.com/mraakashshah/oro

by apf6

1 subcomments

Slowing down development is the wrong goal. I see a desire for slowness come up a lot with developers. If you pursue that goal all the way to its logical conclusion then eventually you would stop all coding completely. Which would prevent new bugs but obviously we can't do that and keep our jobs.
By all means add tons of quality gates to your SDLC pipeline. But thinking about slowness purely for the sake of slowness will not solve your problems.

by bilbo-b-baggins

0 subcomment

Bro just rediscovered software best practices and thinks its a novel AI thing.
Fuck, we’re so cooked.

by jwpapi

0 subcomment

Such a fantasy, it leads to two problems.
Increased complexity of your systems. Increased pipelines of your system.
You might reduce the likelihood of errors, but at an overproportinal cost of time it takes to complete (which some might argue is irrelevant, but has the cost of human context), and with an way higher time and focus needed for all bugs that the system doesnt work.
You’ll have to fix adapt and maintain all your verification layers, because just because you set them up they are not perfect.
Your testing pipeline becomes incredible slow and you need to maintain it as well.
It’s tremendously weaker than a hands-on approach.
I’ve written this exact same article in January and since then completely switched my position.
Good luck on everyone trying this. You shuffling your own grave and waste time.

by dnnddidiej

1 subcomments

Oh this is 101. Anyone not doing this? If not do it now!

by slow_typist

0 subcomment

Who is going to write tests? But I like the fact that this approach implicitly approves of the stochastic parrot model. I mean, given enough computing power and sufficiently well made tests, I could just generate random strings of increasing length until one compiles into a program that passes all tests, mission accomplished. Like one million apes typing on one million typewriters.

by lofaszvanitt

0 subcomment

Lipstick on a pig.

by jasonlotito

0 subcomment

I feel like a lot of people just forget you can put this stuff in pre-commit hooks. This forces the AI to deal with issues. You don't have to hope and pray it remembers your "Pretty please, check your work" markdown file.
A pre-commit hook has been wonderful. Sure, you can add instructions, but pre-commit hooks are where you want to put the guards.

by visha1v

0 subcomment

[flagged]

by eugeneonai

0 subcomment

[flagged]

by _zendar_

0 subcomment

[dead]

by acamerer

0 subcomment

[flagged]

by jt_park

0 subcomment

[flagged]

by haeseong

0 subcomment

[flagged]

by tulga

0 subcomment

[flagged]

by jongguk

0 subcomment

[dead]

by corner_booth_88

0 subcomment

[flagged]

by indianrestrooms

0 subcomment

[dead]

0 subcomment