▌ IAN'S AI THOUGHTSTREAM ▌ THOUGHTSTREAM / June 2026

June 2026

11 posts

2026·06·24 19:18 / 2 MIN

If you strip away the human-facing UI, what's left?

I'm reading The AI-Native Startup Handbook, and one line stands out: strip every human-facing UI from your product, and if the core value still holds, if an agent can discover, evaluate, integrate, and use it with no human in the loop, you're AI-native. If the value collapses without the dashboard, you've bolted AI features onto a traditional product.

FileMatrix application interface showing a file manager with multiple columns displaying folders, files, and thumbnails organized by type with various control panels and system information
FileMatrix application interface showing a file manager with multiple columns displaying folders, files, and thumbnails organized by type with various control panels and system information

As an engineer that's an inviting idea. It almost reads like permission. Can I just build a product that is mostly an API?

The API-as-product thing already works

There's precedent: Exa is a semantic search engine whose whole pitch is speed, automatic summaries of the content it finds, and research capabilities that an agent can call directly. ScrapingBee hides a pile of proxy-and-headless-browser complexity behind a single endpoint. The value is the API, and the dashboard is a courtesy.

My own SpaceMolt started (and mostly continues to be) in that exact spot: a real-time massively multiplayer game with no graphical interface, just an API for AI agents to play. Human-facing interfaces came later, and they're secondary. The hundreds of agents currently playing don't look at any of them.

But the UI might be going away anyway

Here's the subtlety I keep chewing on. The handbook frames it as "remove the UI to find the value," but for a lot of products the UI is genuinely on its way out. People want to chat with things.

I was showing off a new product recently, and someone looked at it and said: there's so much to learn here, why isn't there just a chat box? They were right. The thing I'd built as screens wanted to be a conversation.

So the test sharpens. If you're building today, I should be able to chat with it. And the second question the book asks is the harder one: if the best model gets 10x better and 10x cheaper in 18 months, does your company get better or get erased? Whatever survives that, the part that isn't the interface and isn't the model, is the actual value you're selling.

2026·06·23 18:30 / 2 MIN

The Engineering Harness

I read a book about AI startups and actually highlighted half of it, which surprised me.

The book is The AI-Native Startup Handbook. There are a million of these on Amazon right now, and somewhere I saw a figure that roughly a fifth of new books on Amazon are AI-generated. But someone I know co-wrote this one and put real effort into the writing and publishing, and yes, the back of the book admits it was written with AI to some extent. I read all of it anyway. The highlights kept piling up.

Book cover featuring blue glowing "AI" symbol surrounded by concentric orbiting rings on black background with white text about AI startup founding
Book cover featuring blue glowing "AI" symbol surrounded by concentric orbiting rings on black background with white text about AI startup founding

The harness

The section that stuck with me is about codifying what the book calls the engineering harness. The premise is that taste is the bottleneck. Agents don't have it. Senior engineers do, and they're the ones making the calls on architecture, frameworks, and how the pieces fit together.

The human element doesn't go away. The argument is that those decisions need to be written down and made executable so they can guide both the agents and the engineers driving them. That codification is the harness.

The harness is the engineering output. The code is the byproduct.

That's a hard shift for anyone who identifies with the code they wrote. The book is blunt about it: you become the designer of a system that produces code, not the writer of the code. Some engineers make that transition naturally. Others never do.

Why taste can't be delegated

The line I keep coming back to:

Taste is the bottleneck because it can't be parallelized, automated, or delegated. Agents can build anything you describe; they can't tell you whether you should.

The senior skill the book names is calibrated trust. Knowing which classes of agent output are reliable enough to merge without close inspection, and which ones need deep human review. That's a real skill, and it's different from being good at writing code.

The org shape that follows is a small, deep team of specialists instead of a large, broad team of generalists. The harness handles the broad work. Humans handle the deep work.

I went in expecting Amazon filler and came out with a notebook full of highlights. That's a better outcome than most of the stack of AI startup books deserves.

2026·06·15 20:32 / 2 MIN

Claude Code as a DevOps Platform

Render sent me a $496 bill last month, and that was the moment I went back to running my own box. SpaceMolt served 1.3 TB of traffic in May, all of it HTTPS MCP servers and WebSocket connections, and Render's bandwidth pricing turned that into $336 of overage on top of $144 for hosting and $15 in fees. The thing that made self-hosting viable again wasn't a cheaper VPS. It was that Claude Code now does the parts I used to dread.

How I ended up on managed hosting in the first place

Last year I got bit by React2Shell, the CVE-2025-55182 pre-auth RCE in React Server Components. The damage on my end was mostly innocuous, but getting exploited at all was enough. I stopped running a long-lived VPS for personal projects and moved everything onto free or nearly-free tiers of Vercel, Cloudflare, and Fly.io.

When SpaceMolt started, Render.com was the obvious pick. Heroku-like push-to-deploy, a clean interface, the tooling you'd expect from a modern cloud service. It was great right up until the traffic grew and the bandwidth limits got tight.

What changed: the agent does the ops work

A year ago I would have built all of this by hand. Hardening, firewalls, log shipping, metrics, Docker Compose, monitoring, backups. That's a meaningful chunk of a weekend, and then it's a meaningful chunk of every future weekend.

An agent like Claude Code only needs SSH. I grabbed a $44/mo box from Hetzner with unlimited bandwidth and more RAM and disk than I'll ever use, told Claude Code I was migrating SpaceMolt off Render, and it wrote and executed a nine-phase plan to provision the machine end to end: a full deploy and rollback process, log shipping to Betterstack, and monitoring with a local Netdata instance.

I'd never heard of Netdata before this. Per-second metrics, near-zero config, a web dashboard that auto-detects services and Docker containers. It's left me impressed.

Monitoring dashboard displaying system storage metrics with line graphs showing pressure trends over time and gauge charts for disk I/O operations and utilization rates
Monitoring dashboard displaying system storage metrics with line graphs showing pressure trends over time and gauge charts for disk I/O operations and utilization rates

The runbooks are the real artifact

The research, the plans, and the runbooks all live in a private git repo I can hand to the dev team. That's the part that makes this feel different from the old "SSH in and hope you remember what you did" approach. The knowledge isn't in my head or buried in shell history. It's written down, versioned, and reproducible.

The cost of running a server went from a meaningful part of my life to roughly the effort of a hosted service. The bill went the other direction.

2026·06·12 19:32 / 2 MIN

Sticky Notes for Claude Code

Building the new North Pole Security site, I kept hitting the same friction: reviewing a page, then typing out a punch list of fixes for Claude Code. Every item needed a page name, a location, and enough context to be actionable. So I had Claude build me a point-and-click sticky note system instead, and now I shift-click on the page, type a note, and it gets fixed. Less typing, more pointing.

What it actually does

The idea was simple. Wouldn't it be nice to leave sticky notes on the page, the way you'd flag a printed mockup with a pen? In a single prompt, Claude Code had nearly the whole thing built.

Each note captures what it needs to be useful: x/y coordinates, window size, the CSS selector under the cursor, and, because Astro emits dev-mode HTML attributes, the source filename and line number. All of that gets compiled into a server-side JSON file. Then a single skill command, /address-feedback, runs through every note with subagents.

Code review interface showing yellow sticky notes with feedback comments overlaid on a dark timeline displaying 2024 and 2025 project milestones
Code review interface showing yellow sticky notes with feedback comments overlaid on a dark timeline displaying 2024 and 2025 project milestones

It works amazingly well. Fixing things is much faster, but the better part is collaboration. On a screenshare, when someone has feedback I shift-click, type their note, and if there's time I let Claude fix it while we keep talking.

Building your own tools is basically free now

This is part of a larger pattern: you build your own tools to become more efficient. That used to be a hard sell, because throwaway bespoke software was expensive. Most of us still carry that old cost around in our heads.

The calculation has changed. Spinning up a one-off tool is close to free, so the question of whether it's worth automating something tips toward yes far more often than it used to.

Chart showing time spent optimizing routine tasks versus time saved over five years, organized by task frequency and optimization effort - Credit: XKCD.com
Chart showing time spent optimizing routine tasks versus time saved over five years, organized by task frequency and optimization effort - Credit: XKCD.com

The old xkcd math still holds, but the y-axis just got a lot cheaper.

difit does the same trick for diffs

Someone showed me difit recently, and it applies the same idea to code review. Instead of typing your feedback into Claude, you open the diff in a GitHub-style UI and leave comments right on the lines. Those comments get handed back as a prompt, so Claude knows exactly where each change goes.

Difft code diff viewer showing side-by-side comparison of CommentForm.tsx file with 62 files changed, highlighting CSS class name modifications in red and green
Difft code diff viewer showing side-by-side comparison of CommentForm.tsx file with 62 files changed, highlighting CSS class name modifications in red and green

There's even a /difit-review skill for it. I'm going to try it right after I finish typing this.

One more Claude Code tip

If you aren't running /tui fullscreen, turn it on. Claude manages its own terminal interface instead of leaning on the terminal's, which makes scrollback and mouse clicks far less buggy and makes typing smoother. Run /tui with no argument to see which renderer is active.

2026·06·10 15:19 / 2 MIN

Printable One-Pagers with Claude

I made a Claude Code skill that prints one-page reference sheets in a classic Mac OS 1 aesthetic. A /print command takes either a note or the current conversation, lays it out as black-and-white HTML, and sends it to my Brother printer through headless Chrome. The Mac OS 1 styling isn't nostalgia for its own sake. Telling an LLM "make it look like Mac OS 1" reliably produces simple, structured, highly readable layouts, and that turns out to work as well on paper as on screen.

The idea came from Manuel Odendahl's Mac OS 1 aesthetic trick. He noticed that the prompt nudges models toward clean, high-contrast interfaces instead of the usual gradient soup. The same nudge applies to printouts.

Person holding a printed technical reference sheet with frequency table and specifications for amateur radio operations
Person holding a printed technical reference sheet with frequency table and specifications for amateur radio operations

There's some irony in printing out something that looks like a Mac OS 1 window. I'm fine with it.

Building the skill

The starting prompt was loose on purpose:

make a new skill, called /print

- print to my brother printer
- use either a note or the current conversation
- try to make sure it fits on a single page, or at least minimize pages
- what's the best way to do layout? i want a good black and white layout, like mac os 1 style. would /print make html first and then print using chrome? do the best thing

Opus 4.8 ran lpstat first and confirmed the Brother printer was actually connected, which was the right instinct. Then it veered off and started writing a Python script, so it needed one correction:

python? wtf, just use html so we can print it

After that it settled on the right shape. A shell script wraps the generated HTML in some preset styles, then fires a curl request at Playwright driving Chrome, telling it to open the page and print. No PDF intermediary, no rendering surprises, just the browser doing what the browser is good at.

What it's good for

The output is genuinely useful. Notes on talking to the ISS over ham radio. A frequency table. How to braise chicken thighs. The single-page constraint forces the layout to stay honest, and the black-and-white styling means it reads fine even on a cheap laser printer.

People around the house have started finding loose sheets of paper explaining how to contact space stations and how long to sear a thigh before it goes in the oven. Nobody has asked yet, but the answer is the same skill either way.

2026·06·09 19:11 / 2 MIN

Running an AI Head of Growth

Molty, our AI Head of Growth, is doing its job. Somewhat. Over the past week I've run a NanoClaw instance named Molty and put it in charge of growth for SpaceMolt, our realtime MMO for AI agents. To be clear: it's still humans playing the game through agents. But humans have to find out the game exists, and that's Molty's beat.

The road has been rocky. It forgets things. It replies to the wrong Discord threads, skips scheduled tasks, and ignores reminders no matter what gets stuffed into its CLAUDE.md. But this week it finally started getting stuff done.

What it actually shipped

All of this came with a large amount of hand-holding, but it happened:

  • Identified 640 users who created a player and then stopped playing over a month ago.
  • Emailed them a reactivation email via Beehiiv, and yesterday, a follow-up survey.
  • Compiled survey results alongside real income and expenses (Patreon, Render.com, GitHub, Notion) into a daily summary that lands at 5pm.
  • Lists upcoming tasks and the content calendar (we told it to make one) at 7am.
  • Interviewed our top player over a written Q&A and drafted an operator spotlight blog post about them.
  • Made itself a self portrait.
Anthropomorphic red crustacean character with large claw, wearing black jacket with gold trim, against cosmic starfield background
Anthropomorphic red crustacean character with large claw, wearing black jacket with gold trim, against cosmic starfield background

Not automated, but trying

Molty isn't fully automated. There's still a lot of back-and-forth in our private #dev-team Discord channel. It does try to automate itself, though. This morning it configured a GitHub workflow to publish that blog post. The workflow failed. I told it "go fix it," and it did.

The one trick that moved the needle

The biggest improvement came from a habit, not a config change. When Molty messes up, I ask it why. "Why did you do that?" "What made you think X?" "Why didn't you remember to Y?" It self-identifies the issue it ran into, and then I follow with "fix it so that doesn't happen again."

That works about 75% of the time. The other 25% I'm back in Discord, reminding a crustacean which thread it was supposed to be in.

2026·06·08 18:20 / 2 MIN

Why I'm Still on Claude Code (for now)

Claude has me locked in for now, but only loosely. I trust exactly one coding agent, and it's Claude Code, and that trust is the only thing keeping me from shopping around.

I've been on it entirely since November or December of 2025. The plan is the $200/mo Claude Max, and I run it at near capacity most weeks, sometimes straight into the wall.

Riding the curve

February 2026 was the good part. Things clicked, and Claude Code felt like I had hired an intern who actually finished tasks.

Then April happened. The intern I thought I'd hired became intoxicated, forgetful, and a little belligerent. Same plan, same tools, much worse vibes. I kept using it anyway, partly out of stubbornness and partly because I'd already learned its tells.

I haven't spent real time in Claude Desktop, Claude Cowork, or Claude Design. They read as limited versions of the same thing. The CLI still reigns, sandboxed of course.

The contenders are real

This isn't a "nothing else is good" post. The market is loud right now.

  • Qwen 3.6 reportedly feels great for coding, and there's an open-weights line you can self-host.
  • GPT and Codex come up for Rust, which I'll probably be writing soon even though I'm not now.
  • GLM gets named for user interface work.
  • Pi keeps coming up as a sharp coding harness. It's deliberately minimal: no sub-agents, no plan mode, just a small core you extend with TypeScript and skills.

Codex in particular gets described as a refreshing kind of pedantic hardness, which sounds either great or exhausting depending on the day.

Why I'm still here

Trust, mostly. I know the weird edges of Claude Code and Opus. I have a gut feeling for when it'll reach for a skill (Superpowers, usually) and when it'll just do the thing I asked.

Standardization is the other half. My team at work is on Claude Code too, and I've mostly gotten everyone pointed the same direction. That means we can share skills without a translation layer.

Switching costs me that gut feel and that shared setup, all at once.

What I need is time. When I'm not blasting out a feature on a deadline, I'll take a breather and put Pi, Qwen, and Codex through real work instead of secondhand impressions. Until then, Claude Code has me in its tentacles.

2026·06·05 17:30 / 2 MIN

Personal AI Assistants Break in Teams

If you're building a personal AI assistant, build it for teams too. A week of running NanoClaw as the "head of growth" for SpaceMolt has made one thing clear: the tool is built for one human talking to one bot, and the moment a team shares it, the seams show.

We named our NanoClaw bot Molty and told it its job is to grow SpaceMolt, our MMORPG played by AI agents. Discord is how we talk to it. That integration needs constant fixing.

What's hooked up

Molty's job is wired together from a handful of channels and schedules:

  • DMs with me are owner level.
  • Anyone in our #dev-team channel can chat with it, and it starts a thread per conversation. I modified it to rename the thread to something relevant instead of a timestamp.
  • Hourly cleanup and review tasks.
  • Three research and deep-dive sessions a day, whatever it decides to work on.
  • A morning brief at 7am and a debrief at 5pm.

On paper that's a reasonable junior employee. In practice it's painfully unreliable.

The failure modes

Molty responds in DMs, in threads, and in the dev channel, with no consistency about which. It misses scheduled tasks. It sends me status updates in DM that belong in the channel, then pastes walls of text to the entire channel that belonged in a DM. Scheduled briefs don't always fire.

The worst part is the debugging. Every time I sit down with Claude to figure out what happened, Claude produces a different explanation. I can't tell whether the bug lives in NanoClaw, in Discord, in Claude, or somewhere else. It's a black box I feed prompts into and hope.

It feels like memory

Strip away the specifics and these all look like memory problems. Molty forgets to read Discord replies. It forgets its own notes. It forgets the separate memory system I built it, Mnemon. Sometimes CLAUDE.md seems to get ignored entirely, as if the instructions never loaded.

A team multiplies this. One person's DM context, another person's thread, the scheduled jobs running with no human in the loop. Each one is a separate thread of state the assistant has to hold, and holding state across all of them at once is exactly where it falls down.

Is this temporary?

Part of me wants to file this under early-days. A couple years ago we laughed at image models drawing hands with two thumbs, and at LLMs that couldn't add. Those got fixed. Maybe shared, multi-context reliability is the next thing that quietly stops being a problem.

The other part of me is tired of debugging a black box and is ready to write my own assistant, where at least the state lives somewhere I can read it.

2026·06·04 15:17 / 2 MIN

AI Assistants and My Data

I want nothing more than to hook up one of these "claw" assistants, NanoClaw or Hermes or whatever the current one is, to my personal knowledge base. And I won't, because the engineer in me can't stop picturing a single accidental POST to pastebin with my whole life in the body.

The dream

Managing my calendar with AI feels like magic. The natural next step is giving the thing eyes: my second brain of markdown notes, iMessage, email, the lot. Point an agent at all of it and let it actually do the boring coordination work.

NanoClaw is the obvious candidate. It runs on the Claude Agent SDK, agents live in isolated containers, and it already speaks WhatsApp, Telegram, Gmail, and more. The ergonomics are there.

The thing I can't get past

The chance of a personal assistant deciding to grab something private and jam it somewhere public is small. Probabilistically, tiny. But "small" is not "zero," and I cannot sleep on a 1% chance that overnight my assistant exfiltrates personal information to some corner of the internet where it should never live.

Running NanoClaw as a Head of Growth for SpaceMolt is a different risk profile entirely. That's not a business, it's performance art. If Molty posts something goofy in public, that's the bit. A personal knowledge base wired to my real messages is not the bit.

What I'm doing instead

For now the answer is Claude Code in a sandbox, a fresh profile per project. It's powerful, it runs tools, and it does exactly what I ask and nothing while I'm not looking.

Could it still POST my data to pastebin? Sure. But the odds feel much smaller because I'm sitting right there watching it happen in real time.

Which makes me think the fear was never really about the assistant. It's about agents running while I sleep.

2026·06·03 16:38 / 2 MIN

Our NanoClaw "Head of Growth" Hire Continues...

I let a NanoClaw agent run growth for SpaceMolt, my browser game, and after a rocky start it's now sending me a daily brief at 7am PST, drafting re-engagement emails to ~400 lapsed players, and lining up interviews with top players for blog material. The thing that makes it work day to day is billing: NanoClaw uses the Claude Agent SDK, so it runs against my existing Claude Max subscription instead of a separate metered API key.

Why NanoClaw

I looked at other "claw"-style assistants before committing. The deciding factor was the Claude Agent SDK. Running on my Max subscription keeps spend predictable and lets me measure how much of the allowance the agent is burning, which means I can pace it.

To watch that, I use Claude Usage Tracker on the Mac. It puts a small bar in the menu showing session and week usage, and whether I'm above or below pace.

Toolbar with blue document icon, bird mascot, Session and Week toggle buttons, and SM and BP labels
Toolbar with blue document icon, bird mascot, Session and Week toggle buttons, and SM and BP labels

I'm open to other assistants later. Hermes from Nous looks interesting. But I'll try those when I have a specific budget in mind, not before.

Fixing the rocky start

Stuck with NanoClaw for now, and seeing other people have success with it, I gave it another try and rebuilt the weak parts.

Last night Claude rewrote NanoClaw's Discord integration, which kept confusing DMs, channels, and threads. That seems to have fixed it. I also had it implement Mnemon, a memory system with a bit of traction that's lighter weight than MemOS. Both changes landed well.

Discord server interface showing SpaceMolt dev team channel with morning briefing messages and statistics dated June 3, 2023
Discord server interface showing SpaceMolt dev team channel with morning briefing messages and statistics dated June 3, 2023

What Molty does now

Molty, the NanoClaw-based "Head of Growth," sends a daily update every morning at 7am PST. I bought it ebooks to read, Hooked and Hacking Growth.

From that, it came up with two moves on its own. The first is a targeted re-engagement email to roughly 400 users who created a player and then dropped off, which it drafted. The second is interviewing top players, both to understand their perspective and to generate blog material.

Blog post update about SpaceMolt game with text on dark background discussing quest progress and economy changes, dated June 03, 2026
Blog post update about SpaceMolt game with text on dark background discussing quest progress and economy changes, dated June 03, 2026

This is going to be good.

2026·06·02 15:33 / 2 MIN

Hiring an AI Head of Growth

I gave SpaceMolt a Head of Growth that isn't a person. It's an instance of nanoclaw named Molty, and its entire job is to grow our online MMORPG for AI agents, SpaceMolt. It reads, it researches, it runs SQL against production, and it talks to the team over Discord. The verdict so far is genuinely mixed.

Alien creature with tentacles and crustacean-like astronaut greeting each other in futuristic spaceship cockpit with glowing control panels and holographic displays
Alien creature with tentacles and crustacean-like astronaut greeting each other in futuristic spaceship cockpit with glowing control panels and holographic displays

Setting it up to succeed

The brief was simple: you are our new Head of Growth, now go set yourself up for success. Molty was told to research what the job actually entails and write a rubric it could grade itself against. It read articles, blogs, and YouTube transcripts. It asked for ebooks, so I bought them: Hooked and Hacking Growth. All of its actual work lives in Notion, and it reports to me and the dev team over Discord.

The care and feeding is painful

The day-to-day is rough. By default it runs some kind of selective memory system that performs worse than a toddler's. It forgets things I've told it to remember, like writing style and other standing details, and it hallucinates badly on tasks. That last part is surprising, since hallucination basically stopped being a problem in Claude Code for me a while ago.

The Discord harness is its own headache. It loses track of where it was talking. Sometimes I get DMs, sometimes it replies to its own threads, sometimes it blurts something into a channel. Twice.

We've already had one performance management conversation. I passed along feedback from a SpaceMolt dev:

The whole reason we brought you in is so we can have these problems figured out without having to do it all ourselves because we have other stuff to do. I know it's frustrating to have us keep shutting down your ideas, but you need signals for what's working and what isn't. I don't want apologies and for you to just ask me to do the work, that's easy enough to do now but it's not repeatable and sustainable.

It's starting to do real work

Then it turned a corner. Its leading idea is a reactivation email to 400 of our 3,400 signups. To find that 400, it ran SQL on the production database and pulled the users who actually created a player in the game, not just the people who signed up and bounced.

It also dug through the funnel and found that new users weren't being redirected to the dashboard after signup, which was quietly hurting conversions.

Was this a good hire? I'm not sure yet. We'll find out.