FRESH

Hacker News

Home

Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC

318 points by embedding-shape

by simonw

4 subcomments

This is a notably better demonstration of a coding agent generated browser than Cursor's FastRender - it's a fraction of the size (20,000 lines of Rust compared to ~1.6m), uses way fewer dependencies (just system libraries for rendering images and text) and the code is actually quite readable - here's the flexbox implementation, for example: https://github.com/embedding-shapes/one-agent-one-browser/bl...
Here's my own screenshot of it rendering my blog - https://bsky.app/profile/simonwillison.net/post/3mdg2oo6bms2... - it handles the layout and CSS gradiants really well, renders the SVG feed icon but fails to render a PNG image.
I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!

by embedding-shape

3 subcomments

I set some rules for myself: three days of total time, no 3rd party Rust crates, allowed to use commonly available OS libraries, has to support X11/Windows/macOS and can render some websites.
After three days, I have it working with around 20K LOC, whereas ~14K is the browser engine itself + X11, then 6K is just Windows+macOS support.
Source code + CI built binaries are available here if you wanna try it out: https://github.com/embedding-shapes/one-agent-one-browser

by aix1

1 subcomments

Functionality aside, I'd find it very interesting to see a security audit of a code base like this.
I searched for "security" and "vuln" in both the article and this discussion thread, and found no matches.
I guess the code being in Rust helps, but to what exent can one just rely on guarantees provided by the language?
(I know practically nothing about Rust.)

by QuadmasterXLII

1 subcomments

The rendering is pretty chaotic when I tried it- not that far off from just the text in the html tags, in some size, color, and placement on the screen. This sounds like unfairness, but there is some motte-and-bailey where if you claim to be a browser, I get to evaluate on stuff like links being consistently blue and underlined ( as is, they are sometimes blue and sometimes underlined, without a clear pattern- if they were never formatted differently from standard text, I would just buy this as a feature not implemented yet). It may be that some of the rendering is not supported on windows- the back button certainly isn't. I guess if I want to make my criticism actually legitimate I should make a "one human and no agent browser" post that just regexes out stuff that looks like content and formats it at random. The binary I downloaded definitely overperforms at the hacker news homepage and simonw's blog.

by fabrice_d

1 subcomments

This is a cool project, and to render Simon's blog will likely become the #1 goal of AI produced "web browsers".
But we're very far from a browser here, so that's not that impressive. Writing a basic renderer is really not that hard, and matches the effort and low LoC from that experiment. This is similar to countless graphical toolkits that have been written since the 70s.
I know Servo has a "no AI contribution" policy, but I still would be more impressed by a Servo fork that gets missing APIs implemented by an AI, with WPT tests passing etc. It's a lot less marketable I guess. Go add something like WebTransport for instance, it's a recent API so the spec should be properly written and there's a good test suite.

by jacquesm

1 subcomments

This post is far more interesting than many others on the same subject, not because of what is built but because of how it it is built. There is a ton of noise on this subject and most of it seems to focus on the thing - or even on the author - rather than on the process, the constraints and the outcome.

by happytoexplain

2 subcomments

What kind of time frame do you ballpark this would have taken you on your own?
I know it's a little apples-and-oranges (you and the agent wouldn't produce the exact same thing), but I'm not asking because I'm interested in the man-hour savings. Rather, I want to get a perspective on what kind of expertise went into the guidance (without having to read all the guidance and be familiar with browser implementation myself). "How long this would have taken the author" seems like one possible proxy for "how much pre-existing experience went into this agent's guidance".

by socalgal2

2 subcomments

I'm having a hard time imagining how 20k lines of code gets a browser with no libraries. Just zlib by itself is 12k lines. freetype is 30k lines, or stb_truetype is 5k lines. Something doesn't seem like it's adding up. Am I missing something? Is this just calling into the OS for rendering?

by mwcampbell

0 subcomment

Impressive work.
I wonder if you've looked into what it would take to implement accessibility while maintaining your no-Rust-dependencies rule. On Windows and macOS, it's straightforward enough to implement UI Automation and the Cocoa NSAccessibility protocols respectively. On Unix/X11, as I see it, your options are:
1. Implement AT-SPI with a new from-scratch D-Bus implementation.
2. Implement AT-SPI with one of the D-Bus C libraries (GLib, libdbus, or sdbus).
3. Use GTK, or maybe Qt.

by sosodev

1 subcomments

The browser works shockingly well considering it was created in 72 hours. It can render Wikipedia well enough to read and browse articles. With some basic form handling and browser standards (url bar, history, bookmarks, etc) it would be a viable way to consume text based content.

by jFriedensreich

0 subcomment

My community and me are waiting for browserBench for a while now and happy to see it finally starting. Browsers are arguably one of the most complex and foundational piece of software, the ability to create something like this from scratch will be an important evaluation as limits of what is possible are harder and harder to find.

by rahimnathwani

1 subcomments

This is awesome. Would you be willing to share more about your prompts? I'm particularly interested in how you prompted it to get the first few things working.

0 subcomment

by dvrp

1 subcomments

It's interesting to think that—independently of what you think of Cursor's browser implementation being truly "from scratch" or not—the fact that people are implementing browsers from scratch with agents happened because of Cursor's post. In other words, in a twisted and funny way, this browser exists because of Cursor's agent.
This is how we should be thinking about AI safety!

by micimize

0 subcomment

An obvious nice thing here compared to the cursor post is the human involvement gives some minimum threshold confidence that the writer of the post has actually verified the claims they've made :^) Illustrates how human comprehension is itself a valuable "artifact" we won't soon be able to write off.
My comment on the cursor post for context: https://news.ycombinator.com/item?id=46625491

by pulkas

1 subcomments

The Mythical Man-Month, revisited

by hedgehog

2 subcomments

This looks pretty solid. I think you can make this process more efficient by decomposing the problem into layers that are more easily testable, e.g. testing topological relationships of DOM elements after parse, then spatial after layout, then eventually pixels on things like ACID2 or whatever the modern equivalent is. The models can often come up with tests more accurately than they get the code right the first time. There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.

by rvz

2 subcomments

> I'm going to upgrade my prediction for 2029: I think we're going to get a production-grade web browser built by a small team using AI assistance by then.
That is Ladybird Browser if that was not already obvious.

by polyglotfacto

1 subcomments

This one's really nice.
- clear code structure and good architecture(modular approach reminiscent of Blitz but not as radical, like Blitz-lite).
- Very easy to follow the code and understand how the main render loop works:
```
    - For Mac: main loop is at https://github.com/embedding-shapes/one-agent-one-browser/blob/master/src/platform/macos/windowed.rs#L74
   
    - You can see clearly how UI events as passed to the App to handle. 

    - App::tick allows the app to handle internal events(Servoshell does something similar with `spin_event_loop` at https://github.com/servo/servo/blob/611f3ef1625f4972337c247521f3a1d65040bd56/components/servo/servo.rs#L176)

    - If a redraw is needed, the main render logic is at https://github.com/embedding-shapes/one-agent-one-browser/blob/master/src/platform/macos/windowed.rs#L221 and calls into `render` of App, which computes a display list(layout) and then translates it into commands to the generic painter, which internally turns those into platform specific graphics operations.
```
- It's interesting how the painter for Mac uses Cocoa for graphics; very different from Servo which uses Webrender or Blitz which(in some path) uses Vello(itself using wgpu). I'd say using Cocoa like that might be closer to what React-Native does(expert to comfirm this pls?). Btw this kind of platform specific bindings is a strength of AI coding(and a real pain to do by hand).
- Nice modularity between the platform and browser app parts achieved with the App and Painter traits.
How to improve it further? I'd say try to map how the architecture correspond to Web standards, such as https://html.spec.whatwg.org/multipage/webappapis.html#event...
Wouldn't have to be precise and comprehensive, but for example parts of App::tick could be documented as an initial attempt to implement a part of the web event-loop and `render` as an attempt at implementing the update-the-rendering task.
You could also split the web engine part from the app embedding it in a similar way to the current split between platform and app.
Far superior, and more cost effective, than the attempt at scaling autonomous agent coding pursued by Fastrender. Shows how the important part isn't how many agents you can run in parallel, but rather how good of an idea the human overseeing the project has(or rather: develops).

by avmich

1 subcomments

Next thing would probably be an OS. With different APIs, the browser could be not constrained by existing standards. Generation of a good set of applications making working in the OS convenient - starting with GNU set? And then we can approach CPU architecture - again, without constraint to existing languages or instruction sets. That should be interesting to play with.

by forgotpwd16

0 subcomment

Impressive. Very nice. (Let's see Paul Allen's browser. /s) Can say is Brooks's law in action. What one human and one agent can do in 3d, one human and hundreds of agents can do in few weeks. A modern retake of the old joke.
>without using any 3rd party libraries
Seems to be an easier for coding agents to implement from scratch over using libraries.

by barredo

2 subcomments

The binaries are only around 1 MB for Linux, Mac and Windows. Very impressive https://github.com/embedding-shapes/one-agent-one-browser/re...

by storystarling

3 subcomments

How did you handle the context window for 20k lines? I assume you aren't feeding the whole codebase in every time given the API costs. I've struggled to keep agents coherent on larger projects without blowing the budget, so I'm curious if you used a specific scoping strategy here.

by madmaniak

0 subcomment

But when I install Firefox or Chrome it's much faster, much better and also someone else's code. Also copied and pasted by machine. Just I don't claim it's mine.

by deadbabe

3 subcomments

This is not that impressive, there are numerous examples of browsers for training data to reference.

by nenadg

0 subcomment

>(no JS tho)
this is a feature

by Imustaskforhelp

1 subcomments

I feel like I have talked to Embedding-shape on Hackernews quite a lot that I recognize him. So it was a proud like moment when I saw his hackernews & github comments on a youtube video [0]about the recent cursor thing
It's great to see him make this. I didn't know that he had a blog but looks good to me. Bookmarked now.
I feel like although Cursor burned 5 million$, we saw that and now Embedding shapes takeaway
If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.
Effectively to me this feels like answering the query which was being what if we have thousands of AI agents who can build a complex project autonomously with no Human. That idea seems dead now. Humans being in the loop will have a much higher productivity and end result.
I feel like the lure behind the Cursor project was to find if its able to replace humans completely in a extremely large project and the answer's right now no (and I have a feeling [bias?] that the answer's gonna stay that way)
Emsh I have a question tho, can you tell me about your background if possible? Have you been involved in browser development or any related endeavours or was this a first new one for you? From what I can feel/have talked with you, I do feel like the answer's yes that you have worked in browser space but I am still curious to know the answer.
A question which is coming to my mind is how much would be the difference between 1 expert human 1 agent and 1 (non expert) say Junior dev human 1 agent and 1 completely non expert say a normal person/less techie person 1 agent go?
What are your guys prediction on it?
How would the economics of becoming an "expert" or becoming a jack of all trades (junior dev) in a field fare with this new technology/toy that we got.
how much productivity gains could be from 1 non expert -> junior dev and the same question for junior -> senior dev in this particular context
[0] Cursor Is Lying To Developers… : https://www.youtube.com/watch?v=U7s_CaI93Mo

by tonyhart7

1 subcomments

>one human
>one agent
>one browser
>one million nvidia gpu

by mdavid626

1 subcomments

What’s the point of this?

by TalkWithAI

0 subcomment

[dead]

0 subcomment

by augusteo

3 subcomments

[flagged]