HYUN SOO LEE

Posted on May 8

How I Built an Automated Korean Saju Content Pipeline with Claude Vision and Python

#bazi #kpop #saju #korea

How I Built an Automated Korean Saju Content Pipeline with Claude Vision and Python

TL;DR: I needed to produce structured, channel-specific long-form content about Korean Saju (四柱推命) at scale. Instead of manually reading manse calendar screenshots and writing drafts by hand, I built a Python pipeline that ingests calendar images, extracts structured pillar data via Claude Vision, runs a multi-stage prompt chain, enforces channel formatting rules, and passes output through a QA gate before publishing. Here is exactly how it works.

1. The Problem

Korean Saju content — destiny analysis based on the Four Pillars of Birth (年柱·月柱·日柱·時柱) — is inherently data-dense. A single reading involves:

Six data columns (year, month, day, hour, current major cycle, annual cycle) each carrying a Heavenly Stem (天干), Earthly Branch (地支), a Ten-God relationship (十星), a 12-Phase life stage (十二運星), and one or more auspicious/inauspicious markers (神殺).
Derived structural judgments — chart strength (身强/身弱), dominant element distribution, repeating branch patterns (e.g., 午午午 triple resonance across day branch, hour branch, and annual branch).
Channel-specific tone — a dev.to audience wants technical framing; a YouTube audience wants narrative; a Naver Blog audience wants bullet-point summaries with score tables.

Doing this manually meant: open the manse calendar app, screenshot six columns, re-read every glyph, write a 1500-word draft, reformat for the target channel, QA for factual consistency, then publish. For one subject that takes roughly three hours. At any meaningful content volume, that is not viable.

The goal: reduce human time-on-task to prompt review + final approval, with the machine handling extraction, structuring, drafting, and format enforcement.

2. Pipeline Overview

[Manse Calendar App]
│
▼
[Screenshot Capture Layer] ← 4–6 PNG files per subject
│
▼
[Claude Vision Extraction] ← Structured JSON: pillars, ten-gods, 12-phase,神殺
│
▼
[Validation & Normalisation] ← Python: cross-check stem/branch consistency
│
▼
[Prompt Chain: Draft Generation] ← Multi-block long-form prompt
│
▼
[Channel Formatter] ← dev.to / YouTube script / Naver Blog
│
▼
[QA Gate] ← Hallucination checks, forbidden phrase scan
│
▼
[Output: Markdown / SRT / HTML]

Each stage is a discrete Python module. Stages are orchestrated by a simple pipeline.py that passes a shared subject_context dict through each step.

3. Screenshot Ingestion and Vision Extraction

The manse calendar app outputs visual data, not an API. That means the entry point is always image files. For a typical subject the capture set looks like this:

File	Content
`01_input_confirm.png`	Name, gender, solar date, birth hour status
`02_manse_calendar_complete.png`	Full four-pillar grid with ten-gods, 12-phase, 神殺 badges
`03_total_luck_1.png` – `06_total_luck_4.png`	Long-form luck analysis text blocks

The Vision extraction prompt is the most critical piece of the entire pipeline. A poorly specified extraction prompt produces plausible-sounding but wrong data — and because Saju analysis is deterministic (given a birth date and gender, every pillar value is fixed), any extraction error propagates as a factual error into every downstream output.

Extraction prompt design principles I learned the hard way:

Ask for exact glyph transcription, not interpretation. The prompt says: "Transcribe the Korean hangul label exactly as it appears in the image. If the image shows 정재, output 정재. Do not substitute 편재." This sounds obvious but early versions of the prompt were silently correcting what the model thought were inconsistencies — e.g., flipping a Direct Wealth (正財) label to Indirect Wealth (偏財) because the stem looked like a 偏財 stem to the model. That is catastrophic for downstream accuracy.
Extract position metadata. The output JSON includes not just the value but where it appears: {"column": "hour", "layer": "ten_god_stem", "value": "편재"}. This lets the validation layer catch cross-column inconsistencies.
Separate extraction from interpretation. The Vision call only extracts. It does not judge chart strength, does not infer relationships between pillars, does not produce prose. That happens in the next stage.

Sample extraction output (simplified):

{
"subject": { "name": "Jang Won-young", "gender": "female", "solar_birth": "2004-08-31", "birth_hour": "unknown" },
"pillars": {
"year": { "stem": "甲", "branch": "申", "stem_ten_god": "식신", "branch_ten_god": "편인", "twelve_phase": "장생", "spirits": ["역마살", "공망"] },
"month": { "stem": "壬", "branch": "申", "stem_ten_god": "비견", "branch_ten_god": "편인", "twelve_phase": "장생", "spirits": ["월덕귀인", "역마살", "공망"] },
"day": { "stem": "壬", "branch": "午", "stem_ten_god": "비견", "branch_ten_god": "정재", "twelve_phase": "태", "spirits": ["원덕귀인"] },
"hour": { "stem": "丙", "branch": "午", "stem_ten_god": "편재", "branch_ten_god": "정재", "twelve_phase": "태", "spirits": ["월공", "양인살"] }
},
"major_cycle": { "stem": "庚", "branch": "午", "stem_ten_god": "편인", "branch_ten_god": "정재", "age": 18 },
"annual_cycle_2026": { "stem": "丙", "branch": "午", "stem_ten_god": "편재", "branch_ten_god": "정재" }
}

4. Validation and Normalisation

Before any prose generation touches this data, a Python validation pass runs three checks:

Check 1 — Stem/branch consistency. Given the day master (日主) stem, every ten-god label is mathematically deterministic. The validator recomputes expected ten-god values from the stem and flags any mismatch between the computed value and the extracted label. If Vision extracted a label that contradicts the stem relationship, the pipeline halts and logs a EXTRACTION_CONFLICT error for human review.

Check 2 — Branch repetition detection. The validator scans all six branch slots (four pillars + major cycle branch + annual cycle branch) and flags any branch that appears three or more times. In this subject's case, 午 appears in day branch, hour branch, major cycle branch, and 2026 annual cycle branch — a four-way resonance that is structurally significant and needs to be called out explicitly in the draft prompt.

Check 3 — Forbidden phrase pre-check. A list of absolute-prohibition strings (["반드시", "확실히", "무조건", "절대", "100%"] and their English equivalents) is checked against any text fields extracted from the long-form image blocks. If found, those segments are flagged for tone adjustment before they enter the draft prompt context.

5. Prompt Strategy for Draft Generation

The draft generation stage uses a structured multi-block prompt. The key design decision is block-level responsibility: each named section of the output is generated with its own instruction context, rather than asking the model to produce everything in one pass.

Block map:

Block	Instruction focus
Hook (3 lines)	Lead with the structurally most surprising fact from the extracted data
One-line thesis	Day master + dominant ten-god + 2026 annual cycle, no hedging
Mechanics section	Cite at least 3 ten-gods or 神殺 with positional context
Classical reference	One line from 滴天髓 / 子平真詮 / 窮通寶鑑 / 淵海子平 with source
Modern application	Map the trigger subject's known public activity domain to the annual cycle
Infographic block	`[INFO_GRAPHIC]` placeholder for the design pipeline to fill
Reversal paragraph	One counter-intuitive finding — a 神殺 or branch combination that cuts against the dominant narrative
3-line summary	Compress the entire reading to three sentences
CTA	Injected by the formatter, not by the draft prompt

The prompt explicitly instructs the model: "You are working from extracted structured data. Do not invent pillar values. Do not infer birth details not present in the extraction JSON. If a field is marked 'unknown' (e.g., birth hour), treat the hour pillar as provided and do not speculate about alternative hour values."

6. Channel Formatter

The same structured draft gets reformatted by a channel-specific formatter module. Each formatter is a Python class with a transform(draft: str, context: dict) -> str method.

dev.to formatter rules:

Output valid Markdown with YAML frontmatter
Tags drawn from a controlled vocabulary (bazi, kpop, saju, korea)
No code fences wrapping the entire document
Hanja terms rendered inline with English gloss: Indirect Wealth(偏財), Goat Blade(羊刃殺)
All Korean hangul stripped from final output; terminology expressed in English + hanja only
Target word count: 1400–1600 words; formatter trims or expands the reversal and mechanics sections to hit range
CTA block appended from a template string, not from the draft

YouTube script formatter rules (different module):

Output plain text with [PAUSE] and [B-ROLL: X] markers
Hangul preserved for on-screen text overlays
Sentences capped at 18 words for teleprompter pacing

7. QA Gate

The QA gate runs after formatting and before any publish call. It is not a model call — it is a deterministic rule engine.

Rules that cause a BLOCK (pipeline halts):

Any ten-god label in the output that contradicts the validation JSON (e.g., output says "Direct Wealth(正財)" in the hour stem position but the JSON says 편재)
Any of the forbidden certainty phrases present in the output
Word count outside ±15% of target
Missing required blocks (checked by scanning for section header strings)

Rules that cause a WARN (logged, human reviews before publish):

Classical reference citation present but source tag missing
Infographic placeholder [INFO_GRAPHIC] not replaced by the design pipeline
Subject name appears in a sentence with a romantic or health-definitive predicate

The QA gate rejects roughly 8–12% of drafts on first pass, almost always due to a ten-god label drift introduced during the channel formatting step (the formatter's length-adjustment logic occasionally paraphrases a term incorrectly).

8. Lessons Learned

Extraction fidelity is the entire foundation. Every downstream error I have traced has originated in the Vision extraction stage, not in the draft generation stage. Investing in extraction prompt specificity — including explicit anti-correction instructions — pays back more than any amount of draft prompt tuning.

Deterministic validation beats probabilistic correction. It is tempting to add a "correction" prompt that asks the model to fix inconsistencies in extracted data. Do not do this. The model will produce plausible corrections that are sometimes wrong in ways that are very hard to catch. Halt on conflict and route to human review instead.

Branch repetition is a structural signal, not noise. When the same Earthly Branch appears four times across the pillar set and both cycle slots, that is not a coincidence to mention in passing — it is the structural centre of the reading. The pipeline now explicitly flags multi-resonance patterns and instructs the draft prompt to build the mechanics section around them.

Channel formatting should never touch content semantics. The formatter's job is structure and length, not meaning. Any formatter logic that could alter a ten-god label, a 神殺 name, or a classical citation needs to be moved upstream into the draft prompt or blocked entirely.

9. What This Produces

To illustrate the pipeline end-to-end, here is the kind of structured output it generates. For a subject with Day Master 壬水(Im-su / Deep Water), chart structure 身强(Shin-gang / Strong Chart), current major cycle 庚午(Gyeong-o) carrying Indirect Wealth(偏印) over Direct Wealth(正財), and 2026 annual cycle 丙午(Byeong-o) carrying Indirect Wealth(偏財) over Direct Wealth(正財):

The pipeline correctly identifies that the 午 branch appearing in the day branch, hour branch, major cycle branch, and annual cycle branch simultaneously creates a four-layer resonance on the Direct Wealth(正財) position — the structural centre of the 2026 reading. It flags the Heavenly Noble(天乙貴人) marker on the month pillar as the reversal element (unexpected support arriving in high-pressure periods). It applies the Goat Blade(羊刃殺) marker on the hour pillar as a caution flag for the health and overextension blocks. And it correctly preserves the Empty Void(空亡) markers on both the year and month branches without over-interpreting them as absolute negatives.

The QA gate passes the output. The formatter hits 1,487 words. Total machine time from image input to publish-ready Markdown: under 90 seconds.

[INFO_GRAPHIC]

2026 Pillar Snapshot — 壬水 Day Master

Pillar Stem Branch Stem Ten-God Branch Ten-God Key Marker

Year(年柱) 甲申 Eating God(食神) Indirect Wealth(偏印) Empty Void(空亡)

Month(月柱) 壬申 Friend(比肩) Indirect Wealth(偏印) Heavenly Noble(天乙貴人)

Day(日柱) 壬午 Friend(比肩) Direct Wealth(正財) —

Hour(時柱) 丙午 Indirect Wealth(偏財) Direct Wealth(正財) Goat Blade(羊刃殺)

Major Cycle 庚午 Indirect Wealth(偏印) Direct Wealth(正財) —

2026 Annual 丙午 Indirect Wealth(偏財) Direct Wealth(正財) —

Pillar	Stem	Branch	Stem Ten-God	Branch Ten-God	Key Marker
Year(年柱)	甲	申	Eating God(食神)	Indirect Wealth(偏印)	Empty Void(空亡)
Month(月柱)	壬	申	Friend(比肩)	Indirect Wealth(偏印)	Heavenly Noble(天乙貴人)
Day(日柱)	壬	午	Friend(比肩)	Direct Wealth(正財)	—
Hour(時柱)	丙	午	Indirect Wealth(偏財)	Direct Wealth(正財)	Goat Blade(羊刃殺)
Major Cycle	庚	午	Indirect Wealth(偏印)	Direct Wealth(正財)	—
2026 Annual	丙	午	Indirect Wealth(偏財)	Direct Wealth(正財)	—

Summary

Vision extraction with explicit anti-correction instructions is the single highest-leverage investment in a Saju content pipeline — garbage in, garbage out applies with unusual force when the source data is deterministic.
Deterministic validation before any generative step catches the errors that probabilistic models will confidently paper over.
Channel formatters should be semantically inert — length and structure only, never meaning.

Want to explore the full reading this pipeline produced, or see how the branch-resonance and 神殺 pattern analysis maps to a specific public figure's 2026 trajectory? The complete interactive manse calendar and long-form analysis are available at runartree.com.

This article discusses Saju (四柱) as a structured content domain for automation purposes. All interpretations derived from the pipeline are probabilistic pattern analyses based on classical Chinese metaphysical frameworks and should not be treated as predictive certainties or professional advice of any kind.

Project link

This article is based on an automated content workflow for a Korean Saju platform.

Website: https://runartree.com?utm_source=devto&utm_medium=article&utm_campaign=saju_automation
Stack: Python, Claude Vision, channel-specific formatting, content QA
Domain: Korean Saju / Bazi content automation

The key lesson is simple: generation alone is not enough. A useful publishing pipeline also needs formatting, QA, tracking links, and channel-specific editorial rules.

Bazi interpretation. Not medical, legal, or investment advice.

DEV Community

How I Built an Automated Korean Saju Content Pipeline with Claude Vision and Python

How I Built an Automated Korean Saju Content Pipeline with Claude Vision and Python

1. The Problem

2. Pipeline Overview

3. Screenshot Ingestion and Vision Extraction

4. Validation and Normalisation

5. Prompt Strategy for Draft Generation

6. Channel Formatter

7. QA Gate

8. Lessons Learned

9. What This Produces

[INFO_GRAPHIC]

Summary

Project link

Top comments (0)