▌ IAN'S AI THOUGHTSTREAM ▌ THOUGHTSTREAM / July 2026

July 2026

2 posts

2026·07·02 18:10 / 3 MIN

Giving Your Agent Eyes with Game Boy Hacking

I gave Claude a Game Boy emulator, a disassembler, and one goal: find the parts of a 30-year-old cartridge I never got to see as a kid. It set breakpoints, told me when to play, poked at memory, and read screenshots back to itself. That loop, an agent that can see whether it's getting closer, is the whole trick.

The 90s version of this problem

I grew up with an original Game Boy and later a Game Boy Color. Console gaming back then was a closed world. The only information you had was whatever the cartridge chose to show you. Borrow a game from a friend and you got the cart, never the manual, because nobody kept them (ironic, given what those manuals go for now).

There's a specific memory here. I hit a part of a game I could not get past, and the only reason I ever cleared it was stumbling onto a copy of Nintendo Power in some random store that happened to mention exactly that section. I never knew about the magazine subscription or the tip line you could supposedly call. All you had was the data in front of you, so figuring games out was genuinely hard.

The actual question

I had a Game Genie growing up, but that was mostly infinite lives. Not interesting. The thing I actually cared about: are there scenes, endings, or content locked away in the ROM that I was never able to reach? What secret stuff is sitting in there unrendered?

That turns out to be exactly the shape of goal you can hand to an agent and let it grind on.

Three tools

The setup is three pieces:

  • Gearboy, an extremely detailed Game Boy and Game Boy Color emulator built on imgui. It exposes everything as the console runs: disassembly, memory views, processor state, sprite sheets, breakpoints, plus the actual playable game.
  • GhidraBoy, a Game Boy disassembly toolkit for Ghidra.
  • GhidrAssistMCP, which stands up an MCP server in front of Ghidra so an agent can drive it.
Gearboy emulator running Radar Mission with debugger windows open showing memory editor, disassembler, processor state, symbols, and breakpoints
Gearboy emulator running Radar Mission with debugger windows open showing memory editor, disassembler, processor state, symbols, and breakpoints

Wire those together and Claude can disassemble, investigate, and hunt for exploits in old carts. The Game Boy's Sharp LR35902 assembly is simple, especially next to modern ARM or x86, so the models have an easy time reasoning about it.

Working with Claude on it

Claude did a solid job understanding subroutines and what they were for by inspecting memory, taking screenshots, and comparing those screenshots over time. Finding straight-up cheats was hit or miss, but that was never the point.

The working rhythm was genuinely fun. Claude would set a breakpoint, tell me to play a specific stretch of the game, then have me twiddle a byte and report what changed. Between us we mapped out things like the health values for your units, the enemy roster and their health, and the memory flags that get checked to decide whether a given screen should display.

Terminal screenshot displaying technical instructions for achieving an ADMIRAL rank with score 999999 in a video game, including memory addresses and procedural steps
Terminal screenshot displaying technical instructions for achieving an ADMIRAL rank with score 999999 in a video game, including memory addresses and procedural steps

Give your agents eyes

I've said this before and the Game Boy just makes it concrete. Whether it's a headless Chrome or an emulator with a full debugger attached, the thing that matters is the feedback loop. Give an agent a way to see whether it's achieving its goal, then let it spin. That's when it starts doing surprising things.

2026·07·01 18:28 / 3 MIN

Team-Wide Agentic Harness

Most of what I've learned about running AI agents lives on my own machine and nowhere else. The Linear-management skill, the sandbox conventions, the notes about how our releases work: all of it sits in my personal setup, invisible to the rest of the team. So I'm building a team-wide agentic harness, a checked-in repository of agent config, skills, and evergreen context that everyone can share, review, and improve.

Brown bags and checked-in skills

We've been running AI brown bag sessions, informal knowledge-transfer where everyone trades tips on how they actually use agents day to day. A lot of what comes out of those is concrete and shareable. I've been showing off skills like a Linear-management skill that reviews our queue, checks progress against the roadmap, organizes releases, and generates release notes tailored to specific customers.

Those are easy to share because they're files. You check them in and someone else can run them.

The parts that don't check in

But a big chunk of using agents well isn't a file. It's convention.

Most of us run agents in sandboxes. The most important rule there is to scope all the work into a single directory. You give the sandbox access to the directory you're working in and nothing outside of it, save a few exceptions. That has downstream consequences: temporary files go in a tmp directory, worktrees go in a worktrees subdirectory, and none of that gets checked in.

A plans or notes directory helps too, a loosely organized bucket of agent output artifacts. You can search and read them with something like Obsidian.

The harness

I want to go a step further and check in an entire top-level directory. I call it the harness.

The idea came from The AI-Native Startup Handbook, though really it just codified something I was already doing. I check out repos and do all my work in one top-level directory. It isn't a monorepo. It's a top-level directory that everything about the company or the larger project can reach: multiple repos, research, notes, plans, skills. Once I looked at it as a unit, a lot of it turned out to be shareable.

The other important piece is evergreen content. Descriptions of the company, the product, and procedures we do often, like how releases work and how we use Linear as a team. Those live in an evergreen docs directory so agents have a grounding point, a place to start from where they already understand the product and the value we're delivering.

Why check it in at all

The strongest argument is simple: skills are code. A skill is a set of instructions an agent executes, and any code change should be reviewed. Treating the harness as a repo means it gets a pull request, a diff, and another set of eyes before it changes how everyone's agents behave.

I've been running all of this myself so far. It works for me. The next step is handing it to the team and seeing whether conventions that live comfortably in one person's head survive contact with everyone else's.