▌ IAN'S AI THOUGHTSTREAM ▌ THOUGHTSTREAM / #playwright
Tag

#playwright

1 post

2026·05·29 16:11 / 2 MIN

Giving Coding Agents Eyes

Coding agents that produce visual output need a way to look at what they made. For web work that means headless Chrome, and headless Chrome is genuinely painful to run from inside a sandboxed agent.

Chromium and Firefox both rely on Mach-O quirks, macOS entitlements, and Crashpad behavior that don't survive most sandboxes. I run my agents inside nono.sh profiles per project, and Chrome under that setup is a non-starter.

The workaround

Playwright runs fine outside the sandbox. So it lives on a high port and Claude is told, in its instructions, to always talk to the Playwright MCP server there:

$ npx @playwright/mcp@latest --headless --isolated --browser chrome --port 8931

The sandbox just needs to reach localhost:8931 and the visual-review loop works. Claude renders the local service, takes a screenshot, looks at it, iterates.

That mostly works. What it does not solve: stale processes, hanging Chrome instances, zombies. Every so often Chrome spins out and eats all 64 GB of RAM on my M4 MacBook Pro before I notice.

Lighter options

There has to be something simpler than babysitting a browser. Two things caught my eye recently.

Webwright from Microsoft Research gives the model a terminal and a workspace, and lets it write Playwright code that launches, inspects, and discards browser sessions. The output is a reusable script, not a chat transcript. It scores 60.1% on Odysseys against base GPT-5.4's 33.5%, which is a real jump.

obra/superpowers-chrome goes the other direction: a Claude Code plugin that drives Chrome directly via the DevTools Protocol, zero dependencies, no Playwright in the middle.

When you actually need real Chrome

Advanced bot fingerprinting is the case for keeping a full browser around. If the task is logging into a hostile site or completing a real-world flow, real Chrome with a real profile is the only thing that works.

But most of my use is smaller: render a local dev server, screenshot it, ask Claude if the layout looks right. For that, a 64 GB RAM-eating Chromium feels like the wrong shape of tool. I suspect this gets cleanly solved within a year, probably by something CDP-direct and disposable rather than a long-lived browser process I have to nanny.