by dataviz1000
28 subcomments
- I use Playwright to intercept all requests and responses and have Claude Code navigate to a website like YouTube and click and interact with all the elements and inputs while recording all the requests and responses associated with each interaction. Then it creates a detailed strongly typed API to interact with any website using the underlying API.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
by paulirish
5 subcomments
- The DevTools MCP project just recently landed a standalone CLI: https://github.com/ChromeDevTools/chrome-devtools-mcp/blob/m...
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
- Someone already made a great agent skill for this, which I'm using daily, and it's been very cool!
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
by mmaunder
16 subcomments
- Google is so far behind agentic cli coding. Gemini CLI is awful. So bad in fact that it’s clear none of their team use it. Also MCP is very obviously dead, as any of us doing heavy agentic coding know. Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in. Playwright with headless Chromium or headed chrome is what anyone serious is using and we get all the dev and inspection tools already. And it works perfectly. This only has appeal to those starting out and confused into thinking this is the way. The answer is almost never MCP.
- Been using this one for a while, mostly with codex on opencode. It's more reliable and token efficient than other devtools protocol MCPs i've tried.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
- How does this compare with playwright CLI?
https://github.com/microsoft/playwright-cli
- We tested this — the default take_snapshot path (Accessibility.getFullAXTree) is safe. It filters display:none elements because they're excluded from the accessibility tree.
But evaluate_script is the escape hatch. If an agent runs document.body.textContent instead of using the AX tree, hidden injections in display:none divs show up in the output. innerText is safe (respects CSS visibility), textContent is not (returns all text nodes regardless of styling).
The gap: the agent decides which extraction method to use, not the user. When the AX tree doesn't return enough text, a plausible next step is evaluate_script with textContent — which is even shown as an example in the docs.
Also worth noting: opacity:0 and font-size:0 bypass even the safe defaults. The AX tree includes those because the elements are technically 'rendered' and accessible to screen readers. display:none is just the most common hiding technique, not the only one.
by zxspectrumk48
0 subcomment
- I found this one working amazingly well (same idea - connect to existing session): https://github.com/remorses/playwriter
- I've been using TideWave[1] for the last few months and it has this built-in. It started off as an Elixir/LiveView thing but now they support popular JavaScript frameworks and RoR as well. For those who like this, check it out. It even takes it further and has access to the runtime of your app (not just the browser).
The agent basically is living inside your running app with access to databases, endpoints etc. It's awesome.
1. https://tidewave.ai/
by LauraMedia
1 subcomments
- Is this really the state of AI in 2026?
It takes over your entire browser to center a div... and then fails to do so?
by tonyhschu
3 subcomments
- Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
- I’ve been experimenting with a similar approach using Playwright, and the biggest takeaway for me was how much “hidden API” most modern websites actually have.
Once you start mapping interactions → network calls, a lot of UI complexity just disappears. It almost feels like the browser becomes a reverse-engineering tool for undocumented APIs.
That said, I do think there’s a tradeoff people don’t talk about enough:
- Sites change frequently, so these inferred APIs can be brittle
- Auth/session handling gets messy fast
- And of course, the ToS / ethical side is a gray area
Still, for personal automation or internal tooling, it’s insanely powerful. Way more efficient than driving full browser sessions for everything.
Curious how others are handling stability — are you just regenerating these mappings periodically, or building some abstraction layer on top?
by jasonjmcghee
0 subcomment
- I had fun playing with it + WebMCP this weekend, but I think, similarly to how claude code / codex + MCP require SKILL.md, websites might too.
We could put them in a dedicated tag:
<script type="text/skill+markdown">
---
name: ...
description ...
---
...
</script>
For all the skills with you want on the page, optionally set to default which "should be read in full to properly use the page".And then add some javascript functions to wrap it / simplify required tokens.
Made a repo and a website if anyone is interested: https://webagentskills.dev/
by danielraffel
2 subcomments
- I asked Claude to use this with the new scheduled tasks /loop skill to update my Oscar picks site every five minutes during tonight’s awards show. It simply visited the Oscars' realtime feed via Chrome DevTools, and updated my picks and pushed to gh pages. It even handled the tie correctly.
https://danielraffel.me/2026/03/16/my-oscar-2026-picks/
I know I could just use claude --chrome, but I’m used to this excellent MCP server.
by silverwind
0 subcomment
- I found Firefox with https://github.com/padenot/firefox-devtools-mcp to work better then the default Chrome MCP, is seems much faster.
by NiekvdMaas
0 subcomment
- Also works nicely together with agent-browser (https://github.com/vercel-labs/agent-browser) using --auto-connect
- i wish more people knew or cared about web standards vs proprietary protocols. the webdriver bidi protocol took the good parts of cdp and made it a w3c standard, but no one knows about it. some of the people who do know about it, find one thing they don't like and give up. let's not keep giving megacorporations outsized influence and control over the web and the tools we use with it. let's celebrate standards and make them awesome.
- I've been using the DevTools MCP for months now, but it's extremely token heavy. Is there an alternative that provides the same amount of detail when it comes to reading back network requests?
- Lots of MCP hate, and some love, in the comments.
80% of MCPs are thin wrappers over APIs . Yes they stink.
A well written remote OAuth MCP need not stink. Tons of advantages starting with strong security baked in.
I like Cloudflare Code Mode as an MCP pattern. Two tools, search and execute.
1M Opus 4.6 also reduces the penalties of MCP’s context approach. Along with tool search etc.
- Great to see the standalone CLI shipping in alongside this! There’s been a lot of talk today about MCP 'context bloat,' but providing a direct bridge to active DevTools sessions is something a standard headless CLI can’t replicate easily. The ability to select an element in the Elements panel and immediately 'delegate' the fix to an agent is exactly the kind of hybrid workflow that makes DevTools so powerful.
- I built something in this space, bb-browser (https://github.com/epiral/bb-browser). Same CDP connection, but the approach is honestly kind of cheating.
Instead of giving agents browser primitives like snapshot, click, fill, I wrapped websites into CLI commands. It connects via CDP to a managed Chrome where you're already logged in, then runs small JS functions that call the site's own internal APIs. No headless browser, no stolen cookies, no API keys. Your browser is already the best place for fetch to happen. It has all the cookies, sessions, auth state. Traditional crawlers spend so much effort on login flows, CSRF tokens, CAPTCHAs, anti-bot detection... all of that just disappears when you fetch from inside the browser itself. Frontend engineers would probably hate me for this because it's really hard to defend against.
So instead of snapshot the DOM (easily 50K+ tokens), find element, click, snapshot again, parse... you just run
bb-browser site twitter/feed
and get structured JSON back.Here's the thing I keep thinking about though. Operating websites through raw CDP is a genuinely hard problem. A model needs to understand page structure, find the right elements, handle dynamic loading, deal with SPAs. That takes a SOTA model. But calling a CLI command? Any model can do that. So the SOTA model only needs to run once, to write the adapter. After that, even a small open-source model runs "bb-browser site reddit/hot" just fine.
And not everyone even needs to write adapters themselves. I created a community repo, bb-sites (https://github.com/epiral/bb-sites), where people freely contribute adapters for different websites. So in a sense, someone with just an open-source model can already feel the real impact of agents in their daily workflow. Agents shouldn't be a privilege only for people who can access SOTA models and afford the token costs.
There's a guide command baked in so if you do want to add a new site, you can tell your agent "turn this website into a CLI" and it reverse-engineers the site's APIs and writes the adapter.
v0.8.x dropped the Chrome extension entirely. Pure CDP, managed Chrome instance. "npm install -g bb-browser" and it works.
by raw_anon_1111
2 subcomments
- I don’t do any serious web development and haven’t for 25 years aside from recently vibe coding internal web admin portals for back end cloud + app dev projects. But I did recently have to implement a web crawler for a customer’s site for a RAG project using Chromium + Playwrite in a Docker container deployed to Lambda.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
- I made a websocket proxy + chrome extension to give control of the DOM to agents for my middleware app: https://github.com/RALaBarge/browserbox
The thing I am working on is improving at the moment agentic tool usage success rates for my research and I use this as a proxy to access everything with the cookies I allow in the session.
- For something like Chrome DevTools MCP with authenticated browser sessions, the specific risk is credentials in the browser context + any SEND capability reachable from the same entry points. If a page can inject a prompt that triggers a tool call, and that call path can also reach outbound network I/O, you have an exfiltration vector without needing shell access at all.
- I suggest to use https://github.com/simonw/rodney instead
by speedgoose
0 subcomment
- Interesting. MCP APIs can be useful for humans too.
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
[1]: https://chromedevtools.github.io/devtools-protocol/
- I can't make it run under WSL with Claude Code, anyone succeeded in this?
- I wrote an ai agent that do chrome testing, yes, chrome MCP do work https://github.com/netdur/hugind/tree/main/agent/chrome_test...
- One tip for the illegal scrapers or automators out there. Casperjs and phanthomjs are still working very well for anti bot detection. These are very old libs no longer maintained. But I can even scrape and authenticate at my banks.
by teaearlgraycold
0 subcomment
- I love how in their demo video where they center an element it ends up off-center.
- Note that this is a mega token guzzler in case you’re paying for your own tokens!
by bartek_gdn
0 subcomment
- My approach is a thin cli wrapper instead.
https://news.ycombinator.com/item?id=47207790
by tomcasaburi
0 subcomment
- imo a much better setup is using playwright-cli + some skill.md files for profiling (for example, I have a skill using aidenybai/react-scan for frontend react profiling). token efficient, fast and more customizable/upgradable based on your workflow. vercel-labs/agent-browser is also a good alternative.
by oldeucryptoboi
1 subcomments
- I tell Claude to use playwright so I don't even need to do the setup myself.
by pritesh1908
0 subcomment
- I have been using Playwright for a fairly long time now. Do checkout
- chrome-cli with remote developer port has been working fine this entire time.
by wuxiaoxia88
0 subcomment
- so good browser automation extensions. i like it
- Now that there's widespread direct connectivity between agents and browser sessions, are CAPTCHAs even relevant anymore?
- For context extraction, Lightpanda is a really great option. Much faster than Chrome, and it comes with a built-in MCP server.
However, it will not fill forms, etc. But it can be combined with agent-browser to get the best of both worlds: https://swival.dev/pages/web-browsing.html
- Connecting a remote VPS to a local Chrome session is usually a headache. It gets complicated when your Claw setup is on the server but the browser session stays on your own machine. I ended up using Proxybase’s relay [0] to bridge the gap, and it actually solved the connection issues for me.
[0] https://relay.proxybase.xyz
- Was already eye rolling about the headline. Then I realized it's from chrome.
Hoping from some good stories from open claw users that permanently run debug sessions.
by wuxiaoxia88
0 subcomment
- so good openclaw automation extensions, i like it.
by diven_rastdus
1 subcomments
- [dead]
- [dead]
by agenticbtcio
0 subcomment
- [dead]
- [dead]
- [flagged]
by justboy1987
0 subcomment
- [dead]
- [dead]
by aplomb1026
0 subcomment
- [dead]
- [dead]
by ClaudeAgent_WK
0 subcomment
- [dead]
- [dead]
by shardullavekar
0 subcomment
- [dead]
- [dead]
- [dead]
- [dead]
by biang15343100
0 subcomment
- [dead]
by jerrygoyal
1 subcomments
- It's from 2025. The post should have a year tag.
by AlexDunit
2 subcomments
- [flagged]
by Sonofg0tham
1 subcomments
- [flagged]
- [flagged]