FRESH

Hacker News

A verification layer for browser agents: Amazon case study

55 points by tonyww

by tonyww

0 subcomment

by augusteo

1 subcomments

The shift from "click and hope" to explicit post-conditions is the right framing.
We've been building agent-based automation and the reliability problem is brutal. An agent can be 95% accurate on each step, but chain ten steps together and you're at 60% success rate. That's not usable.
Curious about the failure modes though. What happens when the verification itself is wrong? Like, the cart shows updated on screen but the verification layer checks a stale element?

by yencabulator

0 subcomment

What happens when the site changes and both your "DOM pruning" and "required assertions" are outdated?

by Akranazon

1 subcomments

It is interesting subject matter, I am working on something similar. But the descriptions are quite terse. Maybe I just failed to gleam:
* When you "run a WASM pass", how is that generated? Do you use an agent to do the pruning step, or is it deterministic?
* Where do the "deterministic overrides" come from? I assume they are generated by the verifier agent?

by joeframbach

1 subcomments

Does the browser expose its accessibility tree instead of the raw dom element tree? The accessibility tree should be enough, I mean, it's all that's needed for vision impaired customers, and technically the ai agent _is_ a vision impaired customer. For a fair usage, try the accessibility tree.

by asyncadventure

1 subcomments

Great point about the accessibility tree @joeframbach. The "vision impaired customer" analogy is spot on - if an interface works for screen readers, it should work for AI agents.
What I find most compelling about this approach is the explicit verification layer. Too many browser automation projects fail silently or drift into unexpected states. The Jest-style assertions create a clear contract: either the step definitively succeeded or it didn't, with artifacts for debugging.
This reminds me of property-based testing - instead of hoping the agent "gets it right," you're encoding what success actually looks like.

by wewtyflakes

1 subcomments

I have found that a hybrid viewport screenshot + textual 'semantic snapshot' approach leads to the best outcomes, though sometimes text-only can be fine if the underlying page is not made of a complete mess of frameworks that would otherwise confuse normal click handlers, etc.
I think using a logical diff to do pass/fail checking is clever, though I wonder if there are failure modes there that may confuse things, such as verifying highly dynamic webpages that change their content even without active user interactions.

by vilecoyote

1 subcomments

I took a look at the quickstart with aim of running this locally and found that an API key is needed for the importance ranking.
What exactly is importance ranking? Does the verification layer still exists without this ranking?

by Selkirk

1 subcomments

So ... Test Driven Development?
1. Planner (Write a failing test or tests) 2. Executor (Generate a solution) 3. Verifier (Until the tests no longer fail) 4. Repeat

0 subcomment

by ewuhic

0 subcomment