One thing I'm wondering is if there's anyone doing this at scale? The issue I see is that with complex workflows which take several dozen steps and have complex control flow, the probability of reaching the end falls off pretty hard, because if each step has a .95 chance of completing successfully, after not very many steps you have a pretty small overall probability of success. These use cases are high value because writing a traditional scraper is a huge pain, but we just don't seem to be there yet.
The other side of the coin is simple workflows, but those tend to be the workflows where writing a scraper is pretty trivial. This did work, and I told it to search for a product at a local store, but the program cost $1.05 to run. So doing it at any scale quickly becomes a little bit silly.
So I guess my question is: who is having luck using these tools, and what are you using them for?
One route I had some success with is writing a DSL for scraping and then having the llm generate that code, then interpreting it and editing it when it gets stuck. But then there's the "getting stuck detection" part which is hard etc etc.
We recently gave the model access to a file system so that it never forgets what it's supposed to do - we already have ton of users very happy with recent reliability updates!
We also have a beta workflow-use, which is basically what's mentioned in the comments here to "cache" a workflow: https://github.com/browser-use/workflow-use
Let us know what you guys think - we are shipping hard and fast!
I've been working on a Chrome extension with a side panel. Think about it like the side panel copilot in VSCode, Cursor, or Windsurf. Currently it is automating workflows but those are hard coded. I've started working on a more generalized automation using langchain. Looking at your code is helpful because I can in only a few hundred lines of code recreate a huge portion Playwright's capabilities in a Chrome extension side panel so I should be able to port it to the Chrome extension. That is, I'm creating a tools like mouse click, type, mouse move, open tab, navigate, wait for element, ect..
Looking at your code, I'm thinking about pulling anything that isn't coupled to node while mapping all the Playwright capabilities to the equivalent in a Chrome extension. It's busy work.
If I do that why would I prefer using .baml over the equivalent langchain? What's the differnce? Am I'm comparing apples to oranges? I'm not worried about using langgraph because I should be able to get most of the functionality with xstate v5 [0] plus serialized portable JSON state graphs so I can store custom graphs on a remote server that can be queried by API.
That is my question. I don't see langchain in the dependencies which is cool, but why .baml? Also, what am I'm missing going down this thought path?
[0] https://chatgpt.com/share/685dfc60-106c-8004-bbd0-1ba3a33aba...