▌ IAN'S AI THOUGHTSTREAM ▌ THOUGHTSTREAM / #vibe-coding
Tag

#vibe-coding

5 posts

2026·06·23 18:30 / 2 MIN

The Engineering Harness

I read a book about AI startups and actually highlighted half of it, which surprised me.

The book is The AI-Native Startup Handbook. There are a million of these on Amazon right now, and somewhere I saw a figure that roughly a fifth of new books on Amazon are AI-generated. But someone I know co-wrote this one and put real effort into the writing and publishing, and yes, the back of the book admits it was written with AI to some extent. I read all of it anyway. The highlights kept piling up.

Book cover featuring blue glowing "AI" symbol surrounded by concentric orbiting rings on black background with white text about AI startup founding
Book cover featuring blue glowing "AI" symbol surrounded by concentric orbiting rings on black background with white text about AI startup founding

The harness

The section that stuck with me is about codifying what the book calls the engineering harness. The premise is that taste is the bottleneck. Agents don't have it. Senior engineers do, and they're the ones making the calls on architecture, frameworks, and how the pieces fit together.

The human element doesn't go away. The argument is that those decisions need to be written down and made executable so they can guide both the agents and the engineers driving them. That codification is the harness.

The harness is the engineering output. The code is the byproduct.

That's a hard shift for anyone who identifies with the code they wrote. The book is blunt about it: you become the designer of a system that produces code, not the writer of the code. Some engineers make that transition naturally. Others never do.

Why taste can't be delegated

The line I keep coming back to:

Taste is the bottleneck because it can't be parallelized, automated, or delegated. Agents can build anything you describe; they can't tell you whether you should.

The senior skill the book names is calibrated trust. Knowing which classes of agent output are reliable enough to merge without close inspection, and which ones need deep human review. That's a real skill, and it's different from being good at writing code.

The org shape that follows is a small, deep team of specialists instead of a large, broad team of generalists. The harness handles the broad work. Humans handle the deep work.

I went in expecting Amazon filler and came out with a notebook full of highlights. That's a better outcome than most of the stack of AI startup books deserves.

2026·06·12 19:32 / 2 MIN

Sticky Notes for Claude Code

Building the new North Pole Security site, I kept hitting the same friction: reviewing a page, then typing out a punch list of fixes for Claude Code. Every item needed a page name, a location, and enough context to be actionable. So I had Claude build me a point-and-click sticky note system instead, and now I shift-click on the page, type a note, and it gets fixed. Less typing, more pointing.

What it actually does

The idea was simple. Wouldn't it be nice to leave sticky notes on the page, the way you'd flag a printed mockup with a pen? In a single prompt, Claude Code had nearly the whole thing built.

Each note captures what it needs to be useful: x/y coordinates, window size, the CSS selector under the cursor, and, because Astro emits dev-mode HTML attributes, the source filename and line number. All of that gets compiled into a server-side JSON file. Then a single skill command, /address-feedback, runs through every note with subagents.

Code review interface showing yellow sticky notes with feedback comments overlaid on a dark timeline displaying 2024 and 2025 project milestones
Code review interface showing yellow sticky notes with feedback comments overlaid on a dark timeline displaying 2024 and 2025 project milestones

It works amazingly well. Fixing things is much faster, but the better part is collaboration. On a screenshare, when someone has feedback I shift-click, type their note, and if there's time I let Claude fix it while we keep talking.

Building your own tools is basically free now

This is part of a larger pattern: you build your own tools to become more efficient. That used to be a hard sell, because throwaway bespoke software was expensive. Most of us still carry that old cost around in our heads.

The calculation has changed. Spinning up a one-off tool is close to free, so the question of whether it's worth automating something tips toward yes far more often than it used to.

Chart showing time spent optimizing routine tasks versus time saved over five years, organized by task frequency and optimization effort - Credit: XKCD.com
Chart showing time spent optimizing routine tasks versus time saved over five years, organized by task frequency and optimization effort - Credit: XKCD.com

The old xkcd math still holds, but the y-axis just got a lot cheaper.

difit does the same trick for diffs

Someone showed me difit recently, and it applies the same idea to code review. Instead of typing your feedback into Claude, you open the diff in a GitHub-style UI and leave comments right on the lines. Those comments get handed back as a prompt, so Claude knows exactly where each change goes.

Difft code diff viewer showing side-by-side comparison of CommentForm.tsx file with 62 files changed, highlighting CSS class name modifications in red and green
Difft code diff viewer showing side-by-side comparison of CommentForm.tsx file with 62 files changed, highlighting CSS class name modifications in red and green

There's even a /difit-review skill for it. I'm going to try it right after I finish typing this.

One more Claude Code tip

If you aren't running /tui fullscreen, turn it on. Claude manages its own terminal interface instead of leaning on the terminal's, which makes scrollback and mouse clicks far less buggy and makes typing smoother. Run /tui with no argument to see which renderer is active.

2026·06·08 18:20 / 2 MIN

Why I'm Still on Claude Code (for now)

Claude has me locked in for now, but only loosely. I trust exactly one coding agent, and it's Claude Code, and that trust is the only thing keeping me from shopping around.

I've been on it entirely since November or December of 2025. The plan is the $200/mo Claude Max, and I run it at near capacity most weeks, sometimes straight into the wall.

Riding the curve

February 2026 was the good part. Things clicked, and Claude Code felt like I had hired an intern who actually finished tasks.

Then April happened. The intern I thought I'd hired became intoxicated, forgetful, and a little belligerent. Same plan, same tools, much worse vibes. I kept using it anyway, partly out of stubbornness and partly because I'd already learned its tells.

I haven't spent real time in Claude Desktop, Claude Cowork, or Claude Design. They read as limited versions of the same thing. The CLI still reigns, sandboxed of course.

The contenders are real

This isn't a "nothing else is good" post. The market is loud right now.

  • Qwen 3.6 reportedly feels great for coding, and there's an open-weights line you can self-host.
  • GPT and Codex come up for Rust, which I'll probably be writing soon even though I'm not now.
  • GLM gets named for user interface work.
  • Pi keeps coming up as a sharp coding harness. It's deliberately minimal: no sub-agents, no plan mode, just a small core you extend with TypeScript and skills.

Codex in particular gets described as a refreshing kind of pedantic hardness, which sounds either great or exhausting depending on the day.

Why I'm still here

Trust, mostly. I know the weird edges of Claude Code and Opus. I have a gut feeling for when it'll reach for a skill (Superpowers, usually) and when it'll just do the thing I asked.

Standardization is the other half. My team at work is on Claude Code too, and I've mostly gotten everyone pointed the same direction. That means we can share skills without a translation layer.

Switching costs me that gut feel and that shared setup, all at once.

What I need is time. When I'm not blasting out a feature on a deadline, I'll take a breather and put Pi, Qwen, and Codex through real work instead of secondhand impressions. Until then, Claude Code has me in its tentacles.

2026·05·18 16:05 / 2 MIN

Open Sourcing a MeshCore Bot

I open sourced Blorkobot, a chatbot for our local Bay Area MeshCore LoRa mesh radio network, and put it in the public domain via the Unlicense. That's my new default for any vibe-coded (sorry, "agentically engineered") funsie project that someone could reproduce in an hour by pointing Claude Code at the same problem.

The bot exists to increase chat activity on the mesh, which helps stress-test the network without anyone having to manually spam it. It's about 3k lines of Python, written as a plugin for the Remote Terminal MeshCore client. Nothing exotic.

Why I hesitated

The SoCal MeshCore folks asked if I'd open source it, and I sat on it for a while. Releasing trivial code feels strange. Anyone with an AI coding agent and an afternoon could rebuild this from the README. What's the point of a repo for something that's nearly free to recreate?

I released it anyway, because the value isn't the lines of code, it's the hours of trial and error already baked in: the plugin shape that actually works with Remote Terminal, the commands that turned out to be fun on the mesh, the ones that didn't.

Why Unlicense and not AGPL

The first response after I pushed it was "have you thought about AGPL?"

Setting aside the copyright theory, the AGPL question is really a question about effort. AGPL is the right tool when you've poured serious work into something and want to make sure derivatives stay open. That's not this. This is a weekend project that any competent operator could regenerate from scratch. Defending it with a copyleft license would be cosplay.

Public domain matches the actual situation. Take it, fork it, paste it into your own bot, don't credit me, I genuinely do not care. Unlicense says that cleanly.

That's the rule going forward for the easily-reproducible stuff: Unlicense, no ceremony, no strings.

2026·05·16 19:54 / 1 MIN

Sandboxing AI Coding Agents

Coding agents will happily run whatever they generate, and most of them have your shell, your SSH keys, and your AWS creds one rm -rf away. Sandboxing the agent is the cheapest insurance you can buy, and in 2026 there are finally enough good options that you should pick one.

The landscape splits into a few camps. Full VMs (Firecracker, Lima, OrbStack) give you the strongest isolation and the most overhead. Containers (Docker, Podman, devcontainers) are the default for most people and work fine until the agent needs to touch your real checkout. And then there's the OS-native path: Seatbelt on macOS, seccomp-bpf and Landlock on Linux. Those last two are what the kernel already uses to sandbox App Store apps and Chrome tabs, so the primitives are battle-tested. The friction has always been the ergonomics.

My current favorite is nono. It's a CLI wrapper that uses Landlock on Linux and Seatbelt on macOS to restrict filesystem and network access for any process you launch under it. No container, no VM, no daemon. It ships with profiles for the popular coding agents and lets you write your own, and I've gotten into the habit of creating a profile per project. The agent gets exactly the directories and hosts it needs, and nothing else.

The per-project profile is the part that actually changed my behavior. Once writing a profile takes thirty seconds, you stop talking yourself out of it. The agent can still go off the rails inside the box, but the blast radius is whatever you wrote down, and the rollback story is just git. I'm extremely curious to see where this category goes once more agents ship with sandbox profiles in the box.