How I built a tamper-proof AI for construction quotes after 30 years as a carpenter

#ai #architecture #career #showdev

I'm 49. I learned to code two years ago. Before that, I spent 30 years as a carpenter in Japan.
This is the story of why I built HORIZON SHIELD — and the architectural decision that made it actually useful.

The problem nobody was solving
Japan's residential renovation market is ¥7.35 trillion per year.
Contractors routinely overcharge by 15–20%. The weapon they use is a single Japanese character: 一式 (isshiki) — "lump sum." One line item. No breakdown. Inside that line item, markups of 200–300% are invisible.
After 30 years on job sites, I watched this happen to thousands of families. They had no way to verify whether a quote was fair. Then ChatGPT arrived — and homeowners started asking it for second opinions.
They got different numbers every time.
A contractor will weaponize any inconsistency. "The AI said ¥800,000 last week and ¥1,200,000 this week — which is it?" Game over.

The architectural insight
The problem with using an LLM for cost estimation isn't intelligence. It's determinism.
LLMs are probabilistic by nature. Same input, different output. That's fine for creative writing. It's catastrophic when a contractor is looking for ammunition.
So I separated the concerns completely:
User input (natural language)
↓
LLM layer
(parsing ONLY)
↓
JCCDB v1.2.1
(3,350 line items)
↓
SHA-256 hash
of canonical input
↓
PDF report
The LLM touches zero numbers. It only parses what the user typed into a canonical structured format. All arithmetic happens in a versioned database.
Every report carries a 12-character SHA-256 audit hash. Same input → same hash → same answer, every time. A contractor cannot challenge the number by asking for a rerun.

The War Price Coefficient
Material prices in Japan have been volatile since 2022. A static database goes stale in months.
I added a War Price Coefficient (WPC) — currently ×1.0935 — that adjusts base prices for supply-chain volatility. It's updated monthly using Bank of Japan Corporate Goods Price Index (CGPI) data.
The hash includes the database snapshot version. So when the WPC updates:

Hash changes → price can change (new market conditions)
Hash matches → price is locked (same conditions, same answer)

This is what I called "version-aware idempotency" when I posted about it on HN.

The open dataset
The underlying database — Japan Construction Cost Database (JCCDB) — is open.

3,350+ line items across 7 categories
4 contractor tiers: sole trader (25–35% margin) → major firm (35–45%)
CC-BY 4.0 — free to use, fork, cite
Peer-reviewed preprint on engrXiv (DOI: 10.31224/7007)

The commercial service and the research dataset are intentionally separated. The data is CC-BY 4.0 forever. The API pays rent.
GitHub: ogasurfproject-jpg/japan-construction-cost-database

What I learned building this at 47
Framing a house at 16 was harder than learning to code at 47. Not because coding is easy — but because 30 years of domain knowledge is a massive shortcut.
I didn't need to understand the construction industry. I was the construction industry. I just needed to learn how to encode what I already knew.
The hardest part wasn't the SHA-256 hashing or the Cloudflare Workers architecture. It was deciding what not to put in the LLM.
Every time I was tempted to let the LLM "help" with a calculation, I asked: can a contractor use this inconsistency against a homeowner? If yes, the LLM doesn't touch it.

Try it

Service: shield.the-horizons-innovation.com/index_en.html
Dataset: github.com/ogasurfproject-jpg/japan-construction-cost-database
Paper: engrXiv DOI 10.31224/7007
Pitch deck: shield.the-horizons-innovation.com/pitch.html

Happy to answer questions about the architecture, the dataset, or what it's like to ship your first product at 49.
— Toshi

DEV Community

How I built a tamper-proof AI for construction quotes after 30 years as a carpenter

Top comments (0)