FRESH

Hacker News

Show HN: Magnitude – Open-source AI browser automation framework

119 points by anerli

by rozap

1 subcomments

There are a number of these out there, and this one has a super easy setup and appears to Just Work, so nice job on that. I had it going and producing plausible results within a minute or so.
One thing I'm wondering is if there's anyone doing this at scale? The issue I see is that with complex workflows which take several dozen steps and have complex control flow, the probability of reaching the end falls off pretty hard, because if each step has a .95 chance of completing successfully, after not very many steps you have a pretty small overall probability of success. These use cases are high value because writing a traditional scraper is a huge pain, but we just don't seem to be there yet.
The other side of the coin is simple workflows, but those tend to be the workflows where writing a scraper is pretty trivial. This did work, and I told it to search for a product at a local store, but the program cost $1.05 to run. So doing it at any scale quickly becomes a little bit silly.
So I guess my question is: who is having luck using these tools, and what are you using them for?
One route I had some success with is writing a DSL for scraping and then having the llm generate that code, then interpreting it and editing it when it gets stuck. But then there's the "getting stuck detection" part which is hard etc etc.

by mertunsall

0 subcomment

In browser-use, we combine vision + browser extraction and we find that this gives the most reliable agent: https://github.com/browser-use/browser-use :)
We recently gave the model access to a file system so that it never forgets what it's supposed to do - we already have ton of users very happy with recent reliability updates!
We also have a beta workflow-use, which is basically what's mentioned in the comments here to "cache" a workflow: https://github.com/browser-use/workflow-use
Let us know what you guys think - we are shipping hard and fast!

by dataviz1000

1 subcomments

by grbsh

1 subcomments

Why not just use Claude by itself? Opus and Sonnet are great at producing pixel coordinates and tool usages from screenshots of UIs. Curious as to what your framework gives me over the plain base model.

by ewired

1 subcomments

It was interesting to find out that Qwen 2.5 VL can output coordinates like Sonnet 4, or does that use a different implementation?

by axlee

1 subcomments

Using this for testing instead of regular playwright must 10000x the cost and speed, doesn't it? At which points do the benefits outweigh the costs?

by mountainriver

1 subcomments

by sylware

0 subcomment

Wow, I guess this could be significant for the humans of click/view/account creation farms.

by 10yearsalurker

1 subcomments

by KeysToHeaven

1 subcomments

by Abubaker761

0 subcomment