PA bench: Evaluating web agents on real world personal assistant workflows
35 points by shahules
by mrorigo
1 subcomments
I just don't get why would you would want an agent to use the browser to do these mundane things (check email, work with calendar etc), when you can simply give it a few tools, and save maybe six gazillion tokens per task?
by abhijithneil
1 subcomments
Is there a possible way computer use can be automated using multiple computer use agents from different providers, but also with some sort of routing setup so the best course of action can be chosen without hitting failures (for eg: permission issues in OpenAI could be rerouted to Gemini)
by AIorNot
0 subcomment
Well if these guys computer action model works as they intended (ground up video trained model)