FRESH

Hacker News

Home

WASM Agents: AI agents running in the browser

166 points by selvan

by simonw

1 subcomments

In this case the "agent" definition they are using is the one from the https://github.com/openai/openai-agents-python Python library, which they are running in the browser via Pyodide and WASM.
That library defines an agent as a system prompt and optional tools - notable because many other common agent definitions have the tools as required, not optional.
That explains why their "hello world" demo just runs a single prompt: https://github.com/mozilla-ai/wasm-agents-blueprint/blob/mai...

by thepoet

1 subcomments

We looked at Pyodide and WASM along with other options like firecracker for our need of multi-step tasks that require running LLM generated code locally via Ollama etc. with some form of isolation than running it directly on our dev machines and figured it would be too much work with the various external libraries we have to install. The idea was to get code generated by a powerful remote LLM for general purpose stuff like video editing via ffmpeg, beautiful graphs generation via JS + chromium and stuff and execute it locally with all dependencies being installed before execution.
We built CodeRunner (https://github.com/BandarLabs/coderunner) on top of Apple Containers recently and have been using it for sometime. This works fine but still needs some improvement to work across very arbitrary prompts.

by benatkin

0 subcomment

This is trying to use the word agent to make it sound cool, but it doesn't make a case for why it's particularly about agents and not just basic level AI stuff.
> The agent code is nothing more than a Python script that relies on the openai-agents-python library to run an AI agent backed by an LLM served via an OpenAI-compatible API.
The openai-agents-python code is useful for writing agents but it is possible to use it to write code that isn't very agentic. None of the examples are very agentic.

by meander_water

1 subcomments

When I saw the title, I thought this was running models in the browser. IMO that's way more interesting and you can do it with transformers.js and onnx runtime. You don't even need a gpu.
https://huggingface.co/spaces/webml-community/llama-3.2-webg...

by TekMol

3 subcomments

It seems the only code that runs in the browser here is the code that talks to LLMs on servers.
Why would you need WASM for this?

by om8

0 subcomment

I have a demo that runs llama3-{1,3,8}B in browser on cpu. It can be integrated with this thing in the future to be fully local
https://galqiwi.github.io/aqlm-rs

by ipsum2

1 subcomments

I recently wrote some Javascript to automate clicking coupons. The website checks for non-human clicks using event.isTrusted. Firefox allowed me to bypass this by rewriting the JS to replace s/isTrusted/true, while Chrome Manifest V3 doesn't allow it. Anyway, Firefox might be the future of agents, due to its extensibility.

by _pdp_

1 subcomments

Mildly interesting article - I mean, you can already run a ton of libraries that talk to an inference backend. The only difference here is that the client-side code is in Python, which by itself doesn't make creating agents any simpler - I would argue that it complicates things a tone.
Also, connecting a model to a bunch of tools and dropping it into some kind of workflow is maybe 5% of the actual work. The rest is spent on observability, background tasks, queueing systems, multi-channel support for agents, user experience, etc., etc., etc.
Nobody talks about that part, because most of the content out there is just chasing trends - without much real-world experience running these systems or putting them in front of actual customers with real needs.

by sandGorgon

0 subcomment

i build an opensource mobile browser - we create ai agents (that run in the background) on the mobile browser. and build an extension framework on top so u can create these agents by publishing an extension.
we hook into the android workmanager framework and do some quirky things with tab handling to make this work. its harder to do this on mobile than on desktop.
bunch of people are trying to do interesting things like an automatic labubu purchase agent (on popmart) :D lots of purchase related agents
pull requests welcome ! https://github.com/wootzapp/wootz-browser/pull/334

by asim

4 subcomments

The frustrating thing about this is the limitation of using a browser. Agents should be long-running processes that exist external to a browser. The idea of using wasm is clever, but it feels like the entire browser environment needs to evolve because we're no longer dealing with just web pages. I think we are looking at a true evolution of the web now if this is the way it's going to go

by raybb

1 subcomments

Can you bypass the cors issue with a browser extension? I seem to recall CORS doesn't apply to extensions, or at least the part that isn't injected to the webpages.

by dncornholio

0 subcomment

Putting Python code in a string in a html file is a big no no for me. We should be passed this. It looks like going 20 years back in time.

by zoobab

0 subcomment

No mention of WebGPU...

by _joel

0 subcomment

Having to disable CORS restrictions is a bit meh, I understand why, but still.

by ultrathinkeng

0 subcomment

by niyyou

0 subcomment

Of course. Here is a corrected version of your text that fixes the grammar and typos while keeping the colloquial tone:
I'd like to offer a less skeptical view on this, contrary to what I've read here so far. LLMs that act (a.k.a. agents) bring a whole lot of new security and privacy issues. If we were already heading toward a privacy dystopia (with trackers, centralized services, etc.), agents could take that to a whole new level.
That's why I can only cheer when I see a different path where agents are run locally (by the way, Hugging Face has already published a couple of spaces demonstrating that). As a plus, because they're small, their environmental footprint will also be smaller (although, admittedly, I can also see the Jevons Paradox possibly happening here too).

by N_Lens

5 subcomments

I guess we're at the stage where every permutation of "AI Agents" and X (Where X is technologies & or spaces) must be tried and posted on HN.