FRESH

Hacker News

Home

All your agents are going async

127 points by zknill

by edg5000

12 subcomments

There is nothing wrong with the HTTP layer, it's just a way to get a string into the model.
The problem is the industry obsession on concatenating messages into a conversation stream. There is no reason to do it this way. Every time you run inference on the model, the client gets to compose the context in any way they want; there are more things than just concatenating prompts and LLM ouputs. (A drawback is caching won't help much if most of the context window is composed dynamically)
Coding CLIs as well as web chat works well because the agent can pull in information into the session at will (read a file, web search). The pain point is that if you're appending messages a stream, you're just slowly filling up the context.
The fix is to keep the message stream concept for informal communication with the prompter, but have an external, persistent message system that the agent can interact with (a bit like email). The agent can decide which messages they want to pull into the context, and which ones are no longer relevant.
The key is to give the agent not just the ability to pull things into context, but also remove from it. That gives you the eternal context needed for permanent, daemonized agents.

by bozdemir

1 subcomments

Solid post but half of this reads like a pitch deck for Ably with extra steps. The disclosure would have landed better at the top than buried two paragraphs from the end. The transport isn't really the problem though. The interaction model is. When my cron agent finishes a task at 3am I don't want a live session I can rejoin, I want it to drop a message in Slack or email me and shut up. The scenarios you list (agent outlives caller, unprompted push, caller changes device) are all solved in prod today with pubsub plus a notification provider, and HTTP handles that just fine. Durable session as a first class primitive is cool, but it's a nice-to-have for a narrow slice of work, not a prerequisite for "all your agents going async." The page-refresh-kills-the-chatbot point is fair, but SSE with Last-Event-ID and proper event sourcing gets you 80 percent of what you're describing without inventing a new primitive. Trading systems and chat apps have been doing this for 15 years. WebSockets with resumable streams aren't some unsolved frontier. Where you do have a real point is multi-human collaborative sessions with token streaming across devices. That genuinely is awkward on pure request-response. But framing it as "the transport is broken" oversells it. Most async agents are fire and forget with a webhook at the end, and they're fine.

by _pdp_

1 subcomments

Here is an interesting find.
Let's say that you have two agents running concurrently: A & B. Agent A decides to push a message into the context of agent B. It does that and the message ends up somewhere in the list of the message right at the bottom of the conversation.
The question is, will agent B register that a new message was inserted and will it act on it?
If you do this experiment you will find out that this architecture does not work very well. New messages that are recent but not the latest have little effect for interactive session. In other words, Agent A will not respond and say, "and btw, this and that happened" unless perhaps instructed very rigidly or perhaps if there is some other instrumentation in place.
Your mileage may vary depending on the model.
A better architecture is pull-based. In other words, the agent has tools to query any pending messages. That way whatever needs to be communicated is immediately visible as those are right at the bottom of the context so agents can pay attention to them.
An agent in that case slightly more rigid in a sense that the loop needs to orchestrate and surface information and there is certainly not one-size-fits-all solution here.
I hope this helps. We've learned this the hard way.

by aledevv

2 subcomments

> All of these features are about breaking the coupling between a human sitting at a terminal or chat window and interacting turn-by-turn with the agent.
This means:
- less and less "man-in-the-loop"
- less and less interaction between LLMs and humans
- more and more automation
- more and more decision-making autonomy for agents
- more and more risk (i.e., LLMs' responsibility)
- less and less human responsibility
Problem:
Tasks that require continuous iteration and shared decision-making with humans have two possible options:
- either they stall until human input
- or they decide autonomously at our risk
Unfortunately, automation comes at a cost: RISK.

by samoladji

0 subcomment

Good framing. The transport mismatch is real and already causing pain in production. One thing worth adding: the security surface expands significantly when agents go async. When an agent is synchronous, a human is implicitly in the loop on every action. When it's running in the background on a cron or webhook, there's no one watching. The agent can take hundreds of actions before anyone notices something went wrong. The transport problem you're describing is urgent. The governance problem that comes with async agents is equally urgent and almost nobody is talking about it yet.

by hardsnow

0 subcomment

I’ve been using email as an async channel with agents. Email does proper long-form async and native threaded communication extremely well and IMO is the best match UX-wise.
The system I’ve developed for this is open source and detailed at https://airut.org

by artisin

2 subcomments

So reinventing terminal multiplexing, except over proprietary chat/realtime transports instead of PTYs?

by Yokohiii

1 subcomments

this is a commercial sales pitch for something that doesn't exist

by tim-projects

0 subcomment

I feel like this is a case of just because you can doesn't mean you should.
I still sit and watch my terminals. It's the easiest way to catch problems.

by probabletrain

0 subcomment

> Looking at the OpenClaw model, where the conversation history is in the chat channel and the agent process and LLM provider are both separated from that, you can’t build the same design on Cloudflare or Anthropic
Yes you can - durable objects do exactly what the "Ably pub/sub channel transport" diagram describes. And it's even easier with the cloudflare agents SDK. This article strawmans the capabilities of competing infra.

by skybrian

0 subcomment

This already exists. I’m a happy user of exe.dev VM’s. They have a coding agent called Shelley (https://exe.dev/shelley) that works fine in a web browser on my laptop, tablet, and phone. I can close my laptop at at any time and the agent keeps running in the VM.
It works with multiple LLM’s. The main downside is that since they go through the API, it gets expensive once the monthly quota runs out. (They claim to resell additional API usage at cost, but that doesn’t seem easy to verify.) I’ve switched to using Sonnet for most things but haven’t experimented with cheaper models yet.
It seems like the big price difference between what going through the API costs and what you can get via a subscription is really holding things back.

by 2001zhaozhao

0 subcomment

Easy.
- The agent and all its state stays on a persistent server that saves state on restart
- Just stream the state directly to the client via websockets, or even the entire UI with something like liveview
OpenClaw has already proven this model and I don't see a great reason to try and solve the problem a different way.

by sasipi247

0 subcomment

OpenAI Responses API has WebSocket mode, which can be used instead of SSE, which works very well and feels like a leap forward in terms of performance.
https://developers.openai.com/api/docs/guides/websocket-mode
I have been building on it over the past month holding WebSocket sessions on workers warm, and command routing using NATS JetStream. With this, it has made using sidecar threads for a main thread very simple, as the worker treats them similar.

by konovalov-nk

0 subcomment

Pivot to Erlang is real!
I'm kidding of course but feels like the time has come to look closely into Erlang ecosystem and OTP.
There's even agentic framework for this: https://jido.run/blog/jido-2-0-is-here
If you think about it, OTP makes a lot of sense for always-on, reachable agents. Agents need to talk to external systems all the time: web services, databases, message queues, local tools.
More than a year ago, I had the idea of building a personal AI assistant connected to multiple services (https://github.com/konovalov-nk/synaptra/blob/main/docs/arch...). But I didn't want to build yet another over-engineered k8s setup just to get isolation and separation of concerns.
Over time, I realized OTP was much closer to the model I actually wanted.
Why?
Some services want to run locally: memory, low-latency text-to-speech, private data access. The agent can also run locally while delegating work across supervised processes. Things will fail, and that's fine — Erlang was built around exactly that assumption.
Once you look at agents this way, they indeed look less like chat sessions and more like long-lived, supervised, stateful processes.
In that sense, Erlang really was ahead of its time.

by nexustoken

0 subcomment

Been building a task-dispatch API for a couple months, and the thing that bit me wasn't the async part — it was duplicate work. Two agents an hour apart paying twice for the exact same normalized input. Memory gap, not sync gap.
Once I hashed canonical input JSON, cache hit rate on real traffic was higher than expected — mid-teens % once a handful of workers were live. Curious if anyone here's tried cross-agent result sharing without bolting on a full pub/sub layer.

by Havoc

0 subcomment

Struggling with this at the moment too - the second you have a task that is a blend of CI style pipeline, LLM processing and openclaw handing that data back and forth, maintaining state and triggering next step gets tricky. They're essentially different paradigms of processing data and where they meet there are impedance mismatches.
Even if I can string it together it's pretty fragile.
That said I don't really want to solve this with a SaaS. Trying really hard to keep external reliance to a minimum (mostly the llm endpoint)

by mettamage

0 subcomment

> The interesting thing is what agents can do while not being synchronously supervised by a human.
I vibe coded a message system where I still have all the chat windows open but my agents run a command that finished once a message meant for them comes along and then they need to start it back up again themselves. I kept it semi-automatic like that because I'm still experimenting whether this is what I want.
But they get plenty done without me this way.

by sebastiennight

1 subcomments

The idea of the "session" is an interesting solution, I'll be looking forward to new developments from you on this.
I don't think it solves the other half of the problem that we've been working on, which is what happens if you were not the one initiating the work, and therefore can't "connect back into a session" since the session was triggered by the agent in the first place.

by serbrech

0 subcomment

I recognize the problem statement and decomposition of it. But not the solution. Especially saying that he sees the same problem being worked on by N people. And now that makes in N+1? I’ve been more interested by the protocols and standard that could truly solve this for everyone in a cross-compatible way. Some people have dabbled with atproto as the transport and “memory” storage for example.

by tuo-lei

0 subcomment

the async transport feels like the wrong layer to optimize. biggest issue i keep running into is agent session state being completely non-portable between tools. Claude Code dumps JSONL, Cursor splits data across SQLite and separate JSONL files, and none of them agree on schema or even what counts as a "turn". you can make the message bus async but if you can't reconstruct what the agent did from its own session data, that's the actual blocker. i'd rather see a shared session format than another pubsub layer.

by anamexis

0 subcomment

Maybe I’m missing something, but once you’ve got durable state, don’t you get durable transport more or less “for free” with SSE and Last-Event-ID?

by sonink

0 subcomment

I was of the same view - but then there is this other trend which is putting sync back in favor. And that is that agents are becoming faster. If they are faster - it makes sense to stick around and maintain your 'context' about the task and supervise in real time. The other thing which might keep sync in fashion is that LLM providers are cutting back on cheap tokens. So you have a bigger incentive to stick around and make sure that your agent is not going astray.
The only place I use async now is when I am stepping away and there are a bunch of longer tasks on my plate. So i kick them off and then get to review them when ever I login next. However I dont use this pattern all that much and even then I am not sure if the context switching whenever I get back is really worth it.
Unless the agents get more reliable on long horizon tasks, it seems that async will have limited utility. But can easily see this going into videos feeding the twitter ai launch hype train.

by htahir111

1 subcomments

How would you differentiate between other tools like Temporal or Kitaru (https://kitaru.ai/) ?

by sudb

0 subcomment

I think this post ignores, deliberately or not, the large group of async coding agents that have been GA since around early 2025 - probably the most well-known of which is Devin (which has been around since 2024, but not available to the public).
As an aside, I've built and deployed a production system in which disconnecting & reconnecting from an in-progress LLM stream works and resumes from wherever the stream currently is, through a combination of redis/valkey & websockets - it's not all that hard, it turns out!

by TacticalCoder

0 subcomment

> ... and streaming the tokens back on the HTTP response as an SSE stream
> So how are folks solving this?
$5 per month dedicated server, SSH, tmux.

by verdverm

0 subcomment

If you build a coding agent on Google's ADK, it's designed for this background processing setup. It will transparently save the sessions and events, leaving it up to you what should be sent to the interface. Great framework, happy user with my personal agent stack

by scotty79

0 subcomment

It seems that people started spontaneously using chat apps (telegram and such) for durable channel between them and their async agents.
Maybe better somebody standardize that because we'll end up with agents sending rich payloads between themselves via telegram.

by dist-epoch

1 subcomments

Can anybody explain why many times if you switch away from the chat app on the phone, the conversation can get broken?
Having long living requests, where you submit one, you get back a request_id, and then you can poll for it's status is a 20 year old solved problem.
Why is this such a difficult thing to do in practice for chat apps? Do we need ASI to solve this problem?

by petesergeant

1 subcomments

at https://agentblocks.ai we just use Google-style LROs for this, do we really need a "durable transport for AI agents built around the idea of a session"?

by sharathr

0 subcomment

[dead]

by jimmypk

0 subcomment

[dead]

by potter098

0 subcomment

[dead]

by maxbeech

0 subcomment

[dead]

by EthanFrostHI

0 subcomment

[dead]

by pando85

0 subcomment

[dead]