DEV Community

Cover image for Lessons from 9 months of solo AI automation: 10 patterns I'd give my past self
pikuto1125-pixel
pikuto1125-pixel

Posted on

Lessons from 9 months of solo AI automation: 10 patterns I'd give my past self

canonical_url: https://ai-hack-lab.com/insights/arena-blueprint-launch-eve/

Context — I'm shipping Arena Blueprint on May 13, 2026 (JST). I'm a new HN account, so my Show HN hit a submission limit and didn't reach the front page — figured I'd post the technical write-up here first instead. This is the post I wanted to read 9 months ago, before I burned a quarter on dead ends.

TL;DR

  • I'm a 20-year-old undergrad in Japan. Started writing code 9 months ago by talking to Claude Code.
  • Built and ran an automation system in production: B2B proposal pipeline (Lancers/CrowdWorks), customer reply, content distribution, self-evolution loop.
  • 4 customer-facing failures along the way (duplicate sends, silent skips, dead schedulers, dedup race) — each became a transferable pattern.
  • Packaged as a Notion blueprint + MIT-licensed Skills bundle on GitHub.

This post is what I'd give my past self on day 1. Free. Long. Code-included.


The 10 patterns (one-line each)

# Pattern What it solves
1 Inbox Pattern Dedup sends without a database — file-based, atomic, crash-safe
2 10-layer Validator Cheap regex first, LLM judge last — 10x cost reduction
3 Atomic State Writer Race-free shared-state writes (rename trick)
4 Polling Watcher Lock + dual triggers — daemons that don't quietly die
5 Sender Pattern Audit Static analysis as a CI gate, not a vibe
6 Generator + Validator Split Mandatory post-check layer on every output
7 Reflection Loop Auto-collect failures → inject into next prompt
8 Devil's Advocate Council Institutionalized dissent before unrecoverable decisions
9 Time-to-Detect Log Measure how long until anomalies are noticed (and shrink it)
10 Predictions Registry Force the agent to commit predictions you can verify later

Below I dig into patterns 1 and 2 with code. The other 8 are in the Blueprint with their own failure logs.


Pattern 1: Inbox Pattern

The failure that birthed it

April 2026. The auto-send script for one client fired four times in a row. Same proposal. Same person. I noticed when the customer complained.

The cause: my "dedup" was an in-memory Set that reset every time the watcher restarted. The retry loop did exactly what it was told to do.

The fix

Move the dedup state to disk. Append-only JSONL. Two ops:

// inbox.js — minimal core
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

const INBOX = path.join(__dirname, '.state/inbox.jsonl');

function hashKey(parts) {
  return crypto.createHash('sha256').update(parts.join('|')).digest('hex').slice(0, 16);
}

function has(parts) {
  const key = hashKey(parts);
  if (!fs.existsSync(INBOX)) return false;
  const lines = fs.readFileSync(INBOX, 'utf8').split('\n');
  return lines.some((l) => l && JSON.parse(l).key === key);
}

function commit(parts, meta = {}) {
  const key = hashKey(parts);
  const record = { key, ts: new Date().toISOString(), ...meta };
  fs.mkdirSync(path.dirname(INBOX), { recursive: true });
  fs.appendFileSync(INBOX, JSON.stringify(record) + '\n');
}

module.exports = { has, commit, hashKey };
Enter fullscreen mode Exit fullscreen mode

The rule

Before any side-effect that touches a customer, call inbox.has(parts). If true, exit. Otherwise call your sender, then inbox.commit(parts).

if (inbox.has(['proposal', clientId, today])) {
  console.log('skip — already sent');
  return;
}
await sender.sendProposal(clientId);
inbox.commit(['proposal', clientId, today], { channel: 'lancers' });
Enter fullscreen mode Exit fullscreen mode

That's it. No DB. No Redis. Append-only file you can also cat to debug.

What I learned

The cheap version of dedup is a JSON file. Reach for the database when you've outgrown a JSONL file, not because you're afraid you'll outgrow one.


Pattern 2: 10-layer Validator

The failure that birthed it

I had a single Claude prompt asking "is this proposal text OK?" before each send. Around 80 sends per week. Cost: about $40/week just on validation. And it still let a typo-laden draft through because the model felt generous that day.

The structure

Order layers by cost. Cheap regex first, expensive judge last. If any layer rejects, stop.

Layer 0:  Typography     — full-width / half-width consistency
Layer 1:  Mine fields    — banned phrases (legal, brand)
Layer 2:  Placeholders   — unfilled {{client_name}} etc.
Layer 3:  Duplicate send — calls inbox.has()
Layer 4:  Brand guard    — required disclosure strings
Layer 5:  Sanity         — length, link count, encoding
Layer 6:  Reflection     — known-failure patterns from past incidents
Layer 7:  Platform rules — site-specific limits (Lancers 20-char title, etc.)
Layer 8:  Secrets leak   — emails / phone / API key shapes
Layer 9:  Sender audit   — static analysis of the caller (CI-time, not runtime)
Enter fullscreen mode Exit fullscreen mode

Layer 9 is the one I'd skip explaining if I had to pick one. It runs at code-merge: any new sender must await inbox.has(...) before the side-effect. If it doesn't, the audit fails the build.

Why this beats "ask Claude"

  • 9 of 10 problems are caught by cheap deterministic rules
  • The Claude judge runs on ~10% of cases
  • Validation cost went from $40/week to ~$4
  • The rejection reasons are legible — I can read "Layer 4: missing disclosure" and fix it, vs. "the model said no, somehow"

What I learned

Validators get worse when they're a vibe. They get better when each layer has one job and a name.


The pattern behind the patterns

The thing I keep noticing: most "AI agent" failures are not AI failures. They're plain systems failures with an AI in the loop. The four I shipped in production were:

  1. Dedup race (Pattern 1)
  2. PowerShell encoding error swallowed by silent-skip (now Pattern 7's Reflection Loop catches the pattern signature)
  3. Schedulers dying for weeks without me noticing (Pattern 4's dual-trigger lock)
  4. Make.com Iterator output shape mismatch eating 30 minutes of my evening (Pattern 9's Time-to-Detect Log shrank this kind of incident from days to minutes)

None of these needed a smarter model. They needed a file that remembers things, a layered check, and a watcher that doesn't trust itself.


What's in the Blueprint, what's free

The Blueprint (Notion) and the Skills Bundle (GitHub, MIT) are separate.

  • Skills Bundle on GitHub — free, MIT-licensed. This is the implementation: ~12 skills covering all 10 patterns, ready to drop into a Claude Code session.
  • Blueprint on Notion — paid. This is the writing: deep dives on each pattern, the 4 failure logs in full, the trade-offs I'd flag if you asked me in person.

Both work independently. The Skills bundle is genuinely useful without the Blueprint. The Blueprint is genuinely useful without the Skills bundle — it's a manual, not a wrapper around the code.

Three tiers: $39 Blueprint only, $99 Blueprint + Skills, $199 Blueprint + Skills + a 30-minute call where I look at your stack and tell you which 3 modules to start with.


Why I'm publishing this

I'm 20, a junior in college in Japan, and I want to graduate having shipped a real thing instead of having a clean GPA. This is part of that. The rest is at ai-hack-lab.com.

If any of the patterns above are useful to you and you build something — please tell me. I'm pikuto on most places. I read every reply.


Appendix: The Show HN

I posted this to Show HN on May 12: [link]. The comment thread there has a tighter version of the brief above, plus answers to a handful of stack and pricing questions.

Top comments (0)