My approach has been more pragmatic than theoretical: I break work into sequential stages (plan, design, code) with verification gates. Each gate has deterministic checks (compile, lint, etc) and an agentic reviewer for qualitative assessment.
Collectively, this looks like a distributed system. The artifacts reflect the shared state.
The author's point about external validation converting misinterpretations into detectable failures is exactly what I've found empirically. You can't make the agent reliable on its own, but you can make the protocol reliable by checking at every boundary.
The deterministic gates provide a hard floor of guarantees. The agentic gates provide soft probabilistic assertions.
I wrote up the data and the framework I use: https://michael.roth.rocks/research/trust-topology/
The post acts like agents are a highly complex but well-specified deterministic function. Perhaps, under certain temperature limits, this is approximately true ... but that's a serious restriction and glossed over.
For instance, perhaps the most striking constraint about FLP is that it is about deterministic consensus ... the post glazes over this:
> establishes a fundamental impossibility result dictating consensus in any asynchronous distributed system (yes! that includes us).
No, not any asynchronous distributed system, that might not include us. For instance, Ben-Or (1983, https://dl.acm.org/doi/10.1145/800221.806707) (as a counterexample to the adversary in FLP) essentially says "if you're stuck, flip a coin". There's significant work studying randomized consensus (yes, multi-agents are randomized consensus algorithms): https://www.sciencedirect.com/science/article/abs/pii/S01966...
Now, in Ben-Or, the coins have to be independent sources of randomness, and that's obviously not true in the multi-agent case.
But it's very clear that the language in this post seems to be arguing that these results apply without understanding possibly the most fundamental fact of agents: they are probability distributions -- inherently, they are stochastic creatures.
Difficult to take seriously without a more rigorous justification.
However, AI agents don't share these problems in the classical sense. Building agents is about context attention, relevance, and information density inside a single ordered buffer. The distributed part is creating an orchestrator that manages these things. At noetive.io we currently work on the context relevance part with our contextual broker Semantik.
What isn't solved there is semantic idempotency. Even if a failed agent activity retries correctly at the infrastructure layer, the LLM re-invocation produces a different output. This is why the point about tests converting byzantine failures into crash failures is load-bearing: without external validation gates between activities, you've pushed retry logic onto Temporal but left the byzantine inconsistency problem unsolved. The practical implication is that the value of the test suite in an agentic pipeline scales superlinearly, not just as correctness assurance but as the mechanism that collapses the harder byzantine failure model back into the weaker FLP one.
Good architecture, actor models, and collaboration patterns do not emerge magically from “more agents”.
Maybe what’s missing is the architect’s role.
I wrote an article on this if you're interested: https://x.com/siliconcow/status/2035373293893718117
This might be obvious to everyone but it’s a nice way to me to view it (sort of restating the non-waterfall (agile?) approach to specification discovery)
Ie waterfall design without coding is too under specified, hence the agile waterfall of using code iteratively to find an exact specification
Agreed on the main claim that multi-agentic software development is a distributed systems problem, however I think the distributed consensus point is not the tightest current bottleneck in practice.
The article mentions the Partial Synchronous model (DLS) but doesn't develop it, and that's the usual escape hatch against FLP. In practical agentic workflows it's already showing up as looped improvement cycles bounded by diminishing returns. Each iteration is effectively a round, and each agent's output in that round is a potentially faulty proposal the next round refines. Painful in cost yes, but manageable. If models continue to improve at current rates, I think it's reasonable to assume the number of cycles will decrease.
The more interesting objection is that "agent misinterprets prompt" isn't really byzantine. The 3f+1 bound assumes independent faults, but LLM agents share weights, training data, and priors. When a prompt is ambiguous they don't drift in random directions, they drift the same way together. That isn't majority vote failing loudly, it's consensus succeeding on a shared bias, which is arguably worse.
This is not true. In theory if the agent is smart enough it out thinks your ideas and builds the solution around itself so that it can escape.
It's crazy how good coding agents have become. Sometimes I barely even need to read the code because it's so reliable and I've developed a kind of sense for when I can trust it.
It boggles my mind how accurate it is when you give it the full necessary context. It's more accurate than any living being could possibly be. It's like it's pulling the optimal code directly from the fabric of the universe.
It's kind of scary to think that there might be AI as capable as this applied to things besides next token prediction... Such AI could probably exert an extreme degree of control over society and over individual minds.
I understand why people think we live in a simulation. It feels like the capability is there.