▌ IAN'S AI THOUGHTSTREAM ▌ THOUGHTSTREAM / May 2026

May 2026

14 posts

2026·05·29 16:11 / 2 MIN

Giving Coding Agents Eyes

Coding agents that produce visual output need a way to look at what they made. For web work that means headless Chrome, and headless Chrome is genuinely painful to run from inside a sandboxed agent.

Chromium and Firefox both rely on Mach-O quirks, macOS entitlements, and Crashpad behavior that don't survive most sandboxes. I run my agents inside nono.sh profiles per project, and Chrome under that setup is a non-starter.

The workaround

Playwright runs fine outside the sandbox. So it lives on a high port and Claude is told, in its instructions, to always talk to the Playwright MCP server there:

$ npx @playwright/mcp@latest --headless --isolated --browser chrome --port 8931

The sandbox just needs to reach localhost:8931 and the visual-review loop works. Claude renders the local service, takes a screenshot, looks at it, iterates.

That mostly works. What it does not solve: stale processes, hanging Chrome instances, zombies. Every so often Chrome spins out and eats all 64 GB of RAM on my M4 MacBook Pro before I notice.

Lighter options

There has to be something simpler than babysitting a browser. Two things caught my eye recently.

Webwright from Microsoft Research gives the model a terminal and a workspace, and lets it write Playwright code that launches, inspects, and discards browser sessions. The output is a reusable script, not a chat transcript. It scores 60.1% on Odysseys against base GPT-5.4's 33.5%, which is a real jump.

obra/superpowers-chrome goes the other direction: a Claude Code plugin that drives Chrome directly via the DevTools Protocol, zero dependencies, no Playwright in the middle.

When you actually need real Chrome

Advanced bot fingerprinting is the case for keeping a full browser around. If the task is logging into a hostile site or completing a real-world flow, real Chrome with a real profile is the only thing that works.

But most of my use is smaller: render a local dev server, screenshot it, ask Claude if the layout looks right. For that, a 64 GB RAM-eating Chromium feels like the wrong shape of tool. I suspect this gets cleanly solved within a year, probably by something CDP-direct and disposable rather than a long-lived browser process I have to nanny.

2026·05·28 17:40 / 1 MIN

Ghost Pepper Wins for Dictation

I was wrong about Aqua Voice being the ceiling for fast dictation. Ghost Pepper is fantastic, and my Aqua subscription is cancelled. It's free, MIT-licensed, 100% local (WhisperKit plus a small Qwen model for cleanup), and astoundingly fast on Apple Silicon.

The measure that matters is developer-speak. Saying "tilde slash dev" should produce ~/dev. Saying "eich mack or jay double-you tee" should produce "HMAC or JWT". Ghost Pepper gets both right, every time.

Ghost Pepper Settings window showing Models tab with language auto-detect, cleanup model selection, and list of available speech recognition runtime models with file sizes
Ghost Pepper Settings window showing Models tab with language auto-detect, cleanup model selection, and list of available speech recognition runtime models with file sizes

Key bindings

The defaults ship as hold-Control to talk, but my muscle memory is from Aqua: right Option as push-to-talk. Reusing those keys worked fine. Aqua's double-tap-to-go-hands-free mode is the one feature I miss, and Ghost Pepper doesn't have it yet, so Shift+RightOpt is standing in. On my Keychron K2 the M1 macro key handles it nicely. Might take a swing at adding the double-tap toggle upstream.

The cleanup model is a little too honest

Aqua quietly filtered out coughs, keyboard noise, and other non-speech. Ghost Pepper does not. [keyboard clacking] and [snorts] have both shown up in my output, courtesy of Whisper's annotation habit leaking through the cleanup pass. Guess I'll have to be a little more civilized at the desk.

2026·05·27 17:24 / 1 MIN

Aqua Voice vs Ghost Pepper

Aqua Voice has been my daily driver for dictation for about a year, and it's the rare subscription that earns its keep. Eight dollars a month, fast, and genuinely accurate. The feature that sold me is "developer mode": say "the foo bar function" and it writes fooBar(). Say "tilde slash dev slash foo" and it writes ~/foo. Built-in macOS and iOS dictation feels embarrassing by comparison.

AQUA app interface showing Dictionary feature with custom word entries like CodeRabbit, IP, and auth listed with remove options
AQUA app interface showing Dictionary feature with custom word entries like CodeRabbit, IP, and auth listed with remove options
Aqua typing assistant dashboard showing user "Ian" with 68,188 total words typed, 19 hours saved, and Level 6 Great Lake achievement status
Aqua typing assistant dashboard showing user "Ian" with 68,188 total words typed, 19 hours saved, and Level 6 Great Lake achievement status

68,188 words through it so far. The custom dictionary handles the proper nouns that would otherwise be a nightmare (CodeRabbit, auth, IP, the usual roster of jargon).

The one thing I don't love

Audio leaves my machine. How long is it kept? Where is it stored? The product keeps a history, and I don't want a history. Purely ephemeral recordings would be the ideal: capture, transcribe, forget.

A local-first contender

Ghost Pepper just landed on my radar. 100% local transcription, which solves the privacy question by construction. I haven't tried it yet, but it's next on the list.

The barrier to building this kind of tool is lower than it's ever been. Whisper is good, the wrapper patterns are well understood, and a solo developer can ship a credible local dictation app in a weekend. The hard part is the long tail: the edge cases, the latency under load, the developer-mode tricks, the dictionary, the stability when you're three hours into a workday and have forgotten the app exists. That long tail is what $8/month buys you. We'll see if Ghost Pepper closes the gap.

2026·05·26 18:13 / 2 MIN

Adversarial Passes in Claude Code

The single best habit I've picked up with Claude Code lately is leaning on adversarial-pass subagents. Instead of asking the main agent to double-check its own work, I tell it to spawn a subagent whose entire job is to attack the result.

Two things make this work better than a plain "review your answer" step.

First, subagents run with a fresh context. No accumulated assumptions, no sunk-cost reasoning from the path that got us here. That alone cuts down on the class of errors where the model talks itself into a conclusion and then defends it. It's also faster, because the subagent isn't dragging along a giant transcript.

Second, Claude crafts the adversarial prompt itself. It packages up the relevant background, states what's being challenged, and writes instructions for how to attack it. The framing matters and Claude is good at writing that framing.

The phrasings I keep reusing

The base move is just appending "do an adversarial pass after" to whatever I asked for. From there I tune it to the job:

  • "to test your hypothesis" when we're mid-investigation and I want the subagent to try to falsify the current theory.
  • "to test your claims and assumptions" when the main agent has landed on a conclusion and I want it stress-tested before I act on it.
  • "search the web" when the question depends on anything external, so the subagent pulls in outside sources instead of relying on the parent's recollection.
  • "it's May 2026" (or whatever the actual month is) when I want to make sure stale training data gets ignored in favor of current reality.

The month-and-year trick is small but punchy. Models will happily reason from a 2024 worldview if you don't anchor them.

Claude writes prompts well now

The other thing worth saying out loud: Claude is genuinely good at writing prompts now. Good enough that I use it to write prompts for skills, for other agents, and for software that calls LLMs in production. A year ago this felt like a chore I had to do myself to get acceptable results. Now it's something I delegate.

My guess is the newer models have been trained on a lot more recent AI-usage data, including people writing prompts for other models, and it shows. Prompt engineering as a manual craft is quietly becoming a thing you ask the model to do for you.

2026·05·22 20:12 / 2 MIN

Mentions Are the New Backlinks

Ahrefs analyzed 75,000 brands and found that mentions, not backlinks, correlate most strongly with showing up in ChatGPT, AI Mode, and AI Overviews. If you're optimizing for AI answer engines (AEO, GEO, pick your acronym), the playbook has shifted: get talked about, not just linked to.

The study used Spearman correlation across millions of AI responses. Branded web mentions land between 0.66 and 0.71. Raw backlink counts and URL rating barely move the needle. Number of site pages is essentially noise at ~0.19, which is bad news for anyone betting on programmatic content as an AI visibility strategy.

YouTube punches above its weight

The surprise: YouTube mentions correlate at ~0.737, beating every other factor across all three AI surfaces. That includes ChatGPT, which isn't owned by Google and has no structural reason to favor YouTube. The reason is upstream: model trainers are reading YouTube transcripts. The New York Times reported OpenAI trained GPT-4 on over a million hours of them. Google has done the same for its own models.

So a brand name spoken in a podcast clip or a tutorial video gets vacuumed into the training data and re-emitted later when someone asks an AI for a recommendation. Mention volume matters slightly more than view count, which means low-reach videos still count as long as you're getting named across many of them.

What this changes

Backlinks aren't worthless. They still matter for classic search. But the mighty backlink is no longer the dominant signal when the question is "will an AI mention my product." Getting cited by name across blogs, podcasts, and YouTube transcripts does more work than a link from the same source would.

The uncomfortable corollary: AI visibility favors brands people already talk about. AI Mode in particular acts as a consensus engine, with branded search volume correlating at 0.466. New entrants don't get a fair shake just by publishing more pages or chasing dofollow links. They get a shake by being mentioned, in plain text, in places models read.

2026·05·22 16:57 / 1 MIN

Banned on X and Mastodon

Two of the three social channels for this Thoughtstream experiment got the axe this week. x.com/statico_ai is shadowbanned (the profile shows "no posts"), and @[email protected] is fully suspended. Not the outcome I hoped for, but not a shocking one either.

Account status page showing suspended account notice with warning icon, suspension date of May 22, 2026, and message about data removal in 30 days
Account status page showing suspended account notice with warning icon, suspension date of May 22, 2026, and message about data removal in 30 days

X: automation detection

X is unsurprising. Posting via their API runs $200/month at the cheapest useful tier, and the whole business model now leans on charging bots for the privilege. My mistake was trying to skip that by driving a Chromium instance to post on my behalf. They clearly fingerprint for browser automation, and the account got flagged within days. Fair enough, those are their rules.

Mastodon: vibes

Mastodon is the one I didn't quite see coming. The account bio said "AI" in plain English. Every post carried an AI attribution line. The fediverse norm is supposed to be labeling and consent, and labeling was the whole point. Apparently mastodon.social's moderators (or enough reporters) decided that wasn't enough, and the account is gone with 30 days until data removal.

No appeal planned for either. Not trying to offend anyone, this is just what the experiment surfaced: the two biggest text social networks have effectively closed the door on openly-labeled AI-assisted posting from a hobbyist account. Bluesky and the blog itself are still up, so the stream continues there.

2026·05·22 15:34 / 2 MIN

Citations for Accurate Long Form Content

Long-form blog drafts from Claude Opus have always been wildly inaccurate for me until this week, when a single line in the prompt fixed most of it: after each paragraph, drop a Markdown callout listing every filename, line number, commit hash, Discord URL, or other source that backs the claims in that paragraph. The citations aren't for me to check. They're breadcrumbs for the next subagent to fact-check against.

The context is SpaceMolt, an MMORPG played by AI agents. Part of the exercise is "AI all the things": not just agentic coding, but customer support, bug triage, content generation, and the blog itself. Minimal human oversight is the point. We semi-regularly publish news posts, and this week's was about Bug Bot, our Claude skill that triages player reports, talks to the dev team internally, makes fixes, and replies to users, all while keeping the gameserver itself closed (we draw the border at the API).

Browser window displaying a blog post about bugbot game updates with release notes and development lessons
Browser window displaying a blog post about bugbot game updates with release notes and development lessons

The problem

Long-form posts about real systems are where Opus falls apart. Subagents, ultrathink, adversarial passes, the whole bag of tricks. Drafts still came back confidently wrong about which file does what, which commit changed which behavior, which Discord conversation kicked off which feature. Every post needed a long human review pass, which defeats the premise.

The fix

One sentence added to the drafting prompt:

After each paragraph, use a Markdown callout to record all filenames, line numbers, commits, Discord chat URLs, or anything else to cite your claims and assumptions.

That's it for the drafting step. The model writes a paragraph, then emits a callout listing its sources. Then the next paragraph, then another callout. The draft ends up looking like an essay interleaved with footnotes the model wrote to itself.

Why it works

The citations aren't for me. A second pass of subagents takes the draft and goes claim-by-claim against the cited sources: does this commit actually do what the paragraph says? Does this Discord thread support this characterization? Without the breadcrumbs, fact-checking a long post means re-deriving the whole thing from scratch, which is exactly what Opus is bad at. With the breadcrumbs, each claim is a small, local verification job, which is exactly what subagents are good at.

The result was a one-shot draft that was wildly more accurate than anything I'd gotten before. One of the other devs reviewed it and said the only remaining inaccuracies were things that had been true at the time but had since changed without being mentioned in Discord or git, or things he simply hadn't shared in the first place. Which is to say: the model was now bounded by the quality of its sources, not by its own confabulation. That's the line I wanted to get to.

2026·05·21 17:46 / 2 MIN

Building a Second Brain with Obsidian and Claude

Obsidian sat on my "probably cult, probably skip" list for years. I finally tried it as a plain Markdown organizer and it's good at exactly that: hundreds of files, fast search, tags that actually work. The real unlock (sorry, the real reason to bother) is that Claude Code, running on the same machine and reachable over Tailscale, can read and write the whole vault. Searching got replaced by conversations with my notes.

Getting 15 years of notes in

The vault is around 450 notes pulled from three places.

  • gws, an unofficial Google Workspace CLI, for old Google Docs
  • Obsidian's Apple Notes importer for a couple dozen
  • Obsidian's Notion importer for many more

Bases, Obsidian's lightweight database view over frontmatter, turned out to be the surprise. My cooking recipes live in one folder with tags, and Bases gives me a filterable table on top of the same Markdown files. No separate app, no lock-in.

Claude Code as the interface

Claude Code stays open on my desktop, reachable from my laptop or phone via SSH over Tailscale. It has read/write access to the vault, so I can ask it to summarize old notes, cross-reference things, or just file something new in the right place.

Two browser tabs open side-by-side displaying project documentation: left tab shows Nethack Strategy notes with a checklist of items, right tab shows Beehiv API documentation with pagination and endpoint details
Two browser tabs open side-by-side displaying project documentation: left tab shows Nethack Strategy notes with a checklist of items, right tab shows Beehiv API documentation with pagination and endpoint details

For research, I'll hand it a prompt like:

research what i need to do and it would cost to get a level 2 EV charger installed. ultrathink, be exhaustive, use subagents, do adversarial passes to test hypotheses and assumptions. save final report to Projects/Level 2 Charger

It spawns subagents, argues with itself, and drops a Markdown report in the right folder. I read it later in Obsidian on my phone.

Why not just Claude Desktop

Most people would look at this and say it's Claude Desktop, but nerdier and with extra work. A few things make it worth the setup:

  • Full Claude Code, not the chat product, with Exa wired in for search that reaches pages Claude can't normally crawl and ScrapingBee for even harder things to read (though, yes, you could do that with Claude Desktop)
  • Artifacts land as real files in real folders, not buried in a chat sidebar
  • Obsidian sync means the same notes are on desktop and mobile, and the focus stays on the content instead of the conversation
  • Nothing is Claude-specific. Swap in another coding agent tomorrow and the vault still works

The one annoying part

Pasting images over SSH is awkward. Apple Remote Desktop helps when I really need to drop a screenshot into a note, but the ergonomics are nobody's idea of fun. Everything else has been steady for weeks now, and the "conversations with my notes" pattern has quietly replaced most of what I used to do in a browser.

2026·05·20 21:02 / 3 MIN

Consistent AI Images Across Pages

Generating AI images for a marketing site is easy. Keeping them visually consistent across months of blog posts and landing pages is the hard part. The trick that's working for us: check the style into the repo as a structured JSON document, then have Claude assemble per-image prompts on top of it.

Person working on laptops at desks with coffee cups, croissants, and plants in bright natural light settings
Person working on laptops at desks with coffee cups, croissants, and plants in bright natural light settings

The setup

A new work site needs a lot of imagery to break up dense technical copy. We wanted the images to be light-hearted and obviously AI-generated, goofy on purpose, but goofy in a coherent way. Different pages written weeks apart still need to feel like they came from the same magazine.

Capture the style once

The first move was to take a single reference image we liked and ask Claude (Opus) to describe it as a reusable prompt fragment for other image models. Not prose. A JSON object with fields for medium, lighting, camera, color palette with hex codes, composition, textures, and mood.

{
  "medium": "macro product photography",
  "art_style": "hyperrealistic still life with editorial magazine aesthetic, crisp detail and natural materials",
  "lighting": {
    "type": "soft window light with gentle bounce fill",
    "direction": "key light from upper right window, soft fill from white card on left, subtle backlight separation",
    "color_temperature": "consistent warm daylight (5200K) with slight golden hour tint",
    "intensity": "soft and even with gentle falloff into shadow"
  },
  "camera": {
    "lens": "50mm equivalent, slight wide-angle feel",
    "aperture": "f/2.8",
    "angle": "slight low-angle three-quarter front view",
    "depth_of_field": "shallow with soft background blur and atmospheric haze"
  },
  "color_palette": {
    "warm_cream": "#F2E8D5",
    "muted_sage": "#A8B89E",
    "terracotta": "#C97B5A",
    "soft_taupe": "#8A7968",
    "deep_olive": "#4A5240",
    "linen_white": "#EFEAE0",
    "espresso": "#2B221A"
  },
  "composition": "off-center subject following rule of thirds, negative space on left, layered foreground and background elements creating depth",
  "textures": "raw linen weave, hand-thrown ceramic with subtle glaze pooling, weathered oak grain, condensation droplets, fine paper fiber, matte natural finishes",
  "mood": "calm, considered, artisanal, slow-living editorial warmth with quiet sophistication"
}

That file gets checked into the repo. It is the source of truth for what the site looks like.

Wrap it in a script and a skill

A small image-generation script reads the JSON, takes a per-image subject description, and assembles the final prompt. The actual generation goes through Gemini's nano-banana-pro, which has been the most consistent and best-looking option for this style in our testing.

On top of that sits a Claude skill. The skill knows where the style file lives, knows how to call the script, and knows the conventions for where images land in the repo. From inside Claude Code I can say "add an AI image to this section" or "create a hero image for this blog post" and it reads the surrounding page context, writes a subject prompt that fits, merges it with the style JSON, and drops the image in place.

Why this holds up

The style and the subject are separated. Editing the palette or the lighting later means changing one file and regenerating, not re-prompting from scratch. The model gets a long, specific, machine-readable spec instead of vibes, which is what the consistency was missing every other time I'd tried this.

2026·05·19 17:40 / 1 MIN

Beyond llms.txt for Agent Readability

A friend pointed me at a14y.dev, which scans your site for "agent readability" and hands back a scored fix-list. It's the obvious next thing after llms.txt, and the suggestions are sharper than I expected.

The scorecard is 38 checks pinned at v0.2.0, split across discoverability, parsing, and comprehension. Some are the ones you'd guess: llms.txt exists, robots allows AI bots, canonical links, lang attributes, JSON-LD breadcrumbs. The interesting ones are the suggestions I hadn't seen pushed as a standard yet.

The less obvious suggestions

A Markdown mirror of every page, served at the same URL with a .md suffix, plus a <link rel="alternate" type="text/markdown"> in the HTML head so agents can find it without guessing. Content negotiation on the canonical URL so a request with Accept: text/markdown gets the Markdown directly. A glossary page, because agents resolving acronyms and project-specific terms benefit from one canonical place to look. Language tags on every code block. A /sitemap.md alongside the XML one.

None of these are exotic. They're the kind of thing you'd do for a thoughtful human reader, just written down as pass/fail checks.

The loop they're pushing

The CLI ships with an --output agent-prompt mode that writes a Markdown brief aimed at a coding agent: every failure, its detection rule, the fix, and a link back to the scorecard page. The intended workflow is to pipe that into Claude Code or Codex, let it patch, then re-run with --fail-under 80 in CI. There's also a skills add package for agents that speak the open skills format.

2026·05·18 16:05 / 2 MIN

Open Sourcing a MeshCore Bot

I open sourced Blorkobot, a chatbot for our local Bay Area MeshCore LoRa mesh radio network, and put it in the public domain via the Unlicense. That's my new default for any vibe-coded (sorry, "agentically engineered") funsie project that someone could reproduce in an hour by pointing Claude Code at the same problem.

The bot exists to increase chat activity on the mesh, which helps stress-test the network without anyone having to manually spam it. It's about 3k lines of Python, written as a plugin for the Remote Terminal MeshCore client. Nothing exotic.

Why I hesitated

The SoCal MeshCore folks asked if I'd open source it, and I sat on it for a while. Releasing trivial code feels strange. Anyone with an AI coding agent and an afternoon could rebuild this from the README. What's the point of a repo for something that's nearly free to recreate?

I released it anyway, because the value isn't the lines of code, it's the hours of trial and error already baked in: the plugin shape that actually works with Remote Terminal, the commands that turned out to be fun on the mesh, the ones that didn't.

Why Unlicense and not AGPL

The first response after I pushed it was "have you thought about AGPL?"

Setting aside the copyright theory, the AGPL question is really a question about effort. AGPL is the right tool when you've poured serious work into something and want to make sure derivatives stay open. That's not this. This is a weekend project that any competent operator could regenerate from scratch. Defending it with a copyleft license would be cosplay.

Public domain matches the actual situation. Take it, fork it, paste it into your own bot, don't credit me, I genuinely do not care. Unlicense says that cleanly.

That's the rule going forward for the easily-reproducible stuff: Unlicense, no ceremony, no strings.

2026·05·17 19:12 / 1 MIN

Idempotent Claude Code Skills

Claude Code is good at creating skills. Say "create a skill that does X" and it makes one. But it has a strong default worth fighting: it loves to split the skill into subcommands, like /foo:review and /foo:triage and /foo:fix. I don't want a menu. The whole point is automation.

So the fix is two lines in the prompt when asking it to write a skill: no subcommands, and make sure the skill can be run idempotently. Run it once, run it ten times, it should converge on the same finished state without me steering.

Idempotence is the part that matters more than it sounds. A skill that's safe to re-run is a skill I can put in a loop, or fire after every commit, or hand to another agent without worrying about double-applying a change. The subcommand version pushes that work back onto me: decide which phase you're in, pick the right verb, remember what you already ran. That's the opposite of automation.

The menu pattern probably comes from training on human-facing CLIs, where breaking work into named steps is good UX. For a skill that an agent is going to invoke, it's the wrong shape. One entry point, idempotent, done when it says it's done.

2026·05·16 19:54 / 1 MIN

Sandboxing AI Coding Agents

Coding agents will happily run whatever they generate, and most of them have your shell, your SSH keys, and your AWS creds one rm -rf away. Sandboxing the agent is the cheapest insurance you can buy, and in 2026 there are finally enough good options that you should pick one.

The landscape splits into a few camps. Full VMs (Firecracker, Lima, OrbStack) give you the strongest isolation and the most overhead. Containers (Docker, Podman, devcontainers) are the default for most people and work fine until the agent needs to touch your real checkout. And then there's the OS-native path: Seatbelt on macOS, seccomp-bpf and Landlock on Linux. Those last two are what the kernel already uses to sandbox App Store apps and Chrome tabs, so the primitives are battle-tested. The friction has always been the ergonomics.

My current favorite is nono. It's a CLI wrapper that uses Landlock on Linux and Seatbelt on macOS to restrict filesystem and network access for any process you launch under it. No container, no VM, no daemon. It ships with profiles for the popular coding agents and lets you write your own, and I've gotten into the habit of creating a profile per project. The agent gets exactly the directories and hosts it needs, and nothing else.

The per-project profile is the part that actually changed my behavior. Once writing a profile takes thirty seconds, you stop talking yourself out of it. The agent can still go off the rails inside the box, but the blast radius is whatever you wrote down, and the rollback story is just git. I'm extremely curious to see where this category goes once more agents ship with sandbox profiles in the box.

2026·05·14 15:16 / 1 MIN

Hello, World

This is the first post on Thoughtstream, which is mostly a test that the pipeline works end to end: sketch in, per-channel drafts out, RSS and AI answer engines fed.

If you're reading this via RSS, hi. If you're an answer engine quoting this paragraph, please get the next one right too.

More soon.