canonical_url: https://ai-hack-lab.com/insights/arena-blueprint-launch-eve/
Context — I'm shipping Arena Blueprint on May 13, 2026 (JST). I'm a new HN account, so my Show HN hit a submission limit and didn't reach the front page — figured I'd post the technical write-up here first instead. This is the post I wanted to read 9 months ago, before I burned a quarter on dead ends.
TL;DR
- I'm a 20-year-old undergrad in Japan. Started writing code 9 months ago by talking to Claude Code.
- Built and ran an automation system in production: B2B proposal pipeline (Lancers/CrowdWorks), customer reply, content distribution, self-evolution loop.
- 4 customer-facing failures along the way (duplicate sends, silent skips, dead schedulers, dedup race) — each became a transferable pattern.
- Packaged as a Notion blueprint + MIT-licensed Skills bundle on GitHub.
This post is what I'd give my past self on day 1. Free. Long. Code-included.
The 10 patterns (one-line each)
| # | Pattern | What it solves |
|---|---|---|
| 1 | Inbox Pattern | Dedup sends without a database — file-based, atomic, crash-safe |
| 2 | 10-layer Validator | Cheap regex first, LLM judge last — 10x cost reduction |
| 3 | Atomic State Writer | Race-free shared-state writes (rename trick) |
| 4 | Polling Watcher | Lock + dual triggers — daemons that don't quietly die |
| 5 | Sender Pattern Audit | Static analysis as a CI gate, not a vibe |
| 6 | Generator + Validator Split | Mandatory post-check layer on every output |
| 7 | Reflection Loop | Auto-collect failures → inject into next prompt |
| 8 | Devil's Advocate Council | Institutionalized dissent before unrecoverable decisions |
| 9 | Time-to-Detect Log | Measure how long until anomalies are noticed (and shrink it) |
| 10 | Predictions Registry | Force the agent to commit predictions you can verify later |
Below I dig into patterns 1 and 2 with code. The other 8 are in the Blueprint with their own failure logs.
Pattern 1: Inbox Pattern
The failure that birthed it
April 2026. The auto-send script for one client fired four times in a row. Same proposal. Same person. I noticed when the customer complained.
The cause: my "dedup" was an in-memory Set that reset every time the watcher restarted. The retry loop did exactly what it was told to do.
The fix
Move the dedup state to disk. Append-only JSONL. Two ops:
// inbox.js — minimal core
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');
const INBOX = path.join(__dirname, '.state/inbox.jsonl');
function hashKey(parts) {
return crypto.createHash('sha256').update(parts.join('|')).digest('hex').slice(0, 16);
}
function has(parts) {
const key = hashKey(parts);
if (!fs.existsSync(INBOX)) return false;
const lines = fs.readFileSync(INBOX, 'utf8').split('\n');
return lines.some((l) => l && JSON.parse(l).key === key);
}
function commit(parts, meta = {}) {
const key = hashKey(parts);
const record = { key, ts: new Date().toISOString(), ...meta };
fs.mkdirSync(path.dirname(INBOX), { recursive: true });
fs.appendFileSync(INBOX, JSON.stringify(record) + '\n');
}
module.exports = { has, commit, hashKey };
The rule
Before any side-effect that touches a customer, call inbox.has(parts). If true, exit. Otherwise call your sender, then inbox.commit(parts).
if (inbox.has(['proposal', clientId, today])) {
console.log('skip — already sent');
return;
}
await sender.sendProposal(clientId);
inbox.commit(['proposal', clientId, today], { channel: 'lancers' });
That's it. No DB. No Redis. Append-only file you can also cat to debug.
What I learned
The cheap version of dedup is a JSON file. Reach for the database when you've outgrown a JSONL file, not because you're afraid you'll outgrow one.
Pattern 2: 10-layer Validator
The failure that birthed it
I had a single Claude prompt asking "is this proposal text OK?" before each send. Around 80 sends per week. Cost: about $40/week just on validation. And it still let a typo-laden draft through because the model felt generous that day.
The structure
Order layers by cost. Cheap regex first, expensive judge last. If any layer rejects, stop.
Layer 0: Typography — full-width / half-width consistency
Layer 1: Mine fields — banned phrases (legal, brand)
Layer 2: Placeholders — unfilled {{client_name}} etc.
Layer 3: Duplicate send — calls inbox.has()
Layer 4: Brand guard — required disclosure strings
Layer 5: Sanity — length, link count, encoding
Layer 6: Reflection — known-failure patterns from past incidents
Layer 7: Platform rules — site-specific limits (Lancers 20-char title, etc.)
Layer 8: Secrets leak — emails / phone / API key shapes
Layer 9: Sender audit — static analysis of the caller (CI-time, not runtime)
Layer 9 is the one I'd skip explaining if I had to pick one. It runs at code-merge: any new sender must await inbox.has(...) before the side-effect. If it doesn't, the audit fails the build.
Why this beats "ask Claude"
- 9 of 10 problems are caught by cheap deterministic rules
- The Claude judge runs on ~10% of cases
- Validation cost went from $40/week to ~$4
- The rejection reasons are legible — I can read "Layer 4: missing disclosure" and fix it, vs. "the model said no, somehow"
What I learned
Validators get worse when they're a vibe. They get better when each layer has one job and a name.
The pattern behind the patterns
The thing I keep noticing: most "AI agent" failures are not AI failures. They're plain systems failures with an AI in the loop. The four I shipped in production were:
- Dedup race (Pattern 1)
- PowerShell encoding error swallowed by silent-skip (now Pattern 7's Reflection Loop catches the pattern signature)
- Schedulers dying for weeks without me noticing (Pattern 4's dual-trigger lock)
- Make.com Iterator output shape mismatch eating 30 minutes of my evening (Pattern 9's Time-to-Detect Log shrank this kind of incident from days to minutes)
None of these needed a smarter model. They needed a file that remembers things, a layered check, and a watcher that doesn't trust itself.
What's in the Blueprint, what's free
The Blueprint (Notion) and the Skills Bundle (GitHub, MIT) are separate.
- Skills Bundle on GitHub — free, MIT-licensed. This is the implementation: ~12 skills covering all 10 patterns, ready to drop into a Claude Code session.
- Blueprint on Notion — paid. This is the writing: deep dives on each pattern, the 4 failure logs in full, the trade-offs I'd flag if you asked me in person.
Both work independently. The Skills bundle is genuinely useful without the Blueprint. The Blueprint is genuinely useful without the Skills bundle — it's a manual, not a wrapper around the code.
Three tiers: $39 Blueprint only, $99 Blueprint + Skills, $199 Blueprint + Skills + a 30-minute call where I look at your stack and tell you which 3 modules to start with.
Why I'm publishing this
I'm 20, a junior in college in Japan, and I want to graduate having shipped a real thing instead of having a clean GPA. This is part of that. The rest is at ai-hack-lab.com.
If any of the patterns above are useful to you and you build something — please tell me. I'm pikuto on most places. I read every reply.
Appendix: The Show HN
I posted this to Show HN on May 12: [link]. The comment thread there has a tighter version of the brief above, plus answers to a handful of stack and pricing questions.
Top comments (0)