Here's my own screenshot of it rendering my blog - https://bsky.app/profile/simonwillison.net/post/3mdg2oo6bms2... - it handles the layout and CSS gradiants really well, renders the SVG feed icon but fails to render a PNG image.
I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!
After three days, I have it working with around 20K LOC, whereas ~14K is the browser engine itself + X11, then 6K is just Windows+macOS support.
Source code + CI built binaries are available here if you wanna try it out: https://github.com/embedding-shapes/one-agent-one-browser
I searched for "security" and "vuln" in both the article and this discussion thread, and found no matches.
I guess the code being in Rust helps, but to what exent can one just rely on guarantees provided by the language?
(I know practically nothing about Rust.)
But we're very far from a browser here, so that's not that impressive. Writing a basic renderer is really not that hard, and matches the effort and low LoC from that experiment. This is similar to countless graphical toolkits that have been written since the 70s.
I know Servo has a "no AI contribution" policy, but I still would be more impressed by a Servo fork that gets missing APIs implemented by an AI, with WPT tests passing etc. It's a lot less marketable I guess. Go add something like WebTransport for instance, it's a recent API so the spec should be properly written and there's a good test suite.
I know it's a little apples-and-oranges (you and the agent wouldn't produce the exact same thing), but I'm not asking because I'm interested in the man-hour savings. Rather, I want to get a perspective on what kind of expertise went into the guidance (without having to read all the guidance and be familiar with browser implementation myself). "How long this would have taken the author" seems like one possible proxy for "how much pre-existing experience went into this agent's guidance".
I wonder if you've looked into what it would take to implement accessibility while maintaining your no-Rust-dependencies rule. On Windows and macOS, it's straightforward enough to implement UI Automation and the Cocoa NSAccessibility protocols respectively. On Unix/X11, as I see it, your options are:
1. Implement AT-SPI with a new from-scratch D-Bus implementation.
2. Implement AT-SPI with one of the D-Bus C libraries (GLib, libdbus, or sdbus).
3. Use GTK, or maybe Qt.
This is how we should be thinking about AI safety!
My comment on the cursor post for context: https://news.ycombinator.com/item?id=46625491
That is Ladybird Browser if that was not already obvious.
- clear code structure and good architecture(modular approach reminiscent of Blitz but not as radical, like Blitz-lite).
- Very easy to follow the code and understand how the main render loop works:
- For Mac: main loop is at https://github.com/embedding-shapes/one-agent-one-browser/blob/master/src/platform/macos/windowed.rs#L74
- You can see clearly how UI events as passed to the App to handle.
- App::tick allows the app to handle internal events(Servoshell does something similar with `spin_event_loop` at https://github.com/servo/servo/blob/611f3ef1625f4972337c247521f3a1d65040bd56/components/servo/servo.rs#L176)
- If a redraw is needed, the main render logic is at https://github.com/embedding-shapes/one-agent-one-browser/blob/master/src/platform/macos/windowed.rs#L221 and calls into `render` of App, which computes a display list(layout) and then translates it into commands to the generic painter, which internally turns those into platform specific graphics operations.
- It's interesting how the painter for Mac uses Cocoa for graphics; very different from Servo which uses Webrender or Blitz which(in some path) uses Vello(itself using wgpu). I'd say using Cocoa like that might be closer to what React-Native does(expert to comfirm this pls?). Btw this kind of platform specific bindings is a strength of AI coding(and a real pain to do by hand).- Nice modularity between the platform and browser app parts achieved with the App and Painter traits.
How to improve it further? I'd say try to map how the architecture correspond to Web standards, such as https://html.spec.whatwg.org/multipage/webappapis.html#event...
Wouldn't have to be precise and comprehensive, but for example parts of App::tick could be documented as an initial attempt to implement a part of the web event-loop and `render` as an attempt at implementing the update-the-rendering task.
You could also split the web engine part from the app embedding it in a similar way to the current split between platform and app.
Far superior, and more cost effective, than the attempt at scaling autonomous agent coding pursued by Fastrender. Shows how the important part isn't how many agents you can run in parallel, but rather how good of an idea the human overseeing the project has(or rather: develops).
>without using any 3rd party libraries
Seems to be an easier for coding agents to implement from scratch over using libraries.
this is a feature
It's great to see him make this. I didn't know that he had a blog but looks good to me. Bookmarked now.
I feel like although Cursor burned 5 million$, we saw that and now Embedding shapes takeaway
If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.
Effectively to me this feels like answering the query which was being what if we have thousands of AI agents who can build a complex project autonomously with no Human. That idea seems dead now. Humans being in the loop will have a much higher productivity and end result.
I feel like the lure behind the Cursor project was to find if its able to replace humans completely in a extremely large project and the answer's right now no (and I have a feeling [bias?] that the answer's gonna stay that way)
Emsh I have a question tho, can you tell me about your background if possible? Have you been involved in browser development or any related endeavours or was this a first new one for you? From what I can feel/have talked with you, I do feel like the answer's yes that you have worked in browser space but I am still curious to know the answer.
A question which is coming to my mind is how much would be the difference between 1 expert human 1 agent and 1 (non expert) say Junior dev human 1 agent and 1 completely non expert say a normal person/less techie person 1 agent go?
What are your guys prediction on it?
How would the economics of becoming an "expert" or becoming a jack of all trades (junior dev) in a field fare with this new technology/toy that we got.
how much productivity gains could be from 1 non expert -> junior dev and the same question for junior -> senior dev in this particular context
[0] Cursor Is Lying To Developers… : https://www.youtube.com/watch?v=U7s_CaI93Mo
>one agent
>one browser
>one million nvidia gpu