DEV Community

HYUN SOO LEE
HYUN SOO LEE

Posted on

Building an Automated Saju (Korean Bazi) Content Pipeline with Claude Vision + Python

The Problem Nobody Talks About in Niche Content Automation

Most content automation tutorials use generic domains — recipes, product reviews, travel guides. But the real stress-test for a pipeline is a domain with structured visual inputs, domain-specific ontology, and strict factual constraints.

Korean Saju (四柱, the Four Pillars of Destiny — the East Asian equivalent of Bazi 八字) is exactly that domain. Each subject has a Manse Calendar (萬歲曆) image containing:

  • Four pillars (年柱·月柱·日柱·時柱), each with a Heavenly Stem (天干) and Earthly Branch (地支)
  • Ten-God relationships (十星): Friend(比肩), Robber(劫財), Eating God(食神), Hurting Officer(傷官), Indirect Wealth(偏財), Direct Wealth(正財), Seven Killings(偏官), Direct Officer(正官), Indirect Resource(偏印), Direct Resource(正印)
  • Twelve Growth Phases (十二運星): 長生, 沐浴, 冠帶, 建祿, 帝旺, 衰, 病, 死, 墓, 絶, 胎, 養
  • Spiritual Killings (神殺): Goat Blade(羊刃殺), Heavenly Noble(天乙貴人), Peach Blossom(桃花殺), Travel Horse(驛馬殺), Empty Void(空亡), and others
  • Current Major Cycle (大運) and Annual Cycle (歲運)

If your pipeline hallucinates even one Ten-God label — say, writing "Direct Wealth(正財)" when the image clearly shows "Indirect Wealth(偏財)" — the entire article fails domain QA. The error rate in naive LLM generation for this domain is surprisingly high, because the model's training data contains many Bazi generalizations that override what the image actually says.

This article documents the architecture I built to solve that, using Claude Vision, a Python orchestrator, a structured verification layer, and channel-specific formatters.


The Data Source: Manse Calendar Images

The input for each content unit is a screenshot from a Korean Saju app or web service. A real example (used throughout this article as the reference case) contains:

Subject: Gong Yoo (공유), Male, Solar: 1979-07-10, Lunar: 1979-06-17, birth hour unknown

Four Pillars extracted from image:

Pillar Stem (天干) Branch (地支) Stem Ten-God Branch Ten-God 12-Phase Key Sinsals
Year(年柱) Robber(劫財) Robber(劫財) 衰(쇠) Heavenly Noble(天乙貴人), Goat Blade(羊刃殺), Flower Killing(花蓋殺), Ghost Gate(鬼門關殺)
Month(月柱) Hurting Officer(傷官) Robber(劫財) 衰(쇠) Heavenly Noble(天乙貴人), Flower Killing(花蓋殺), Ghost Gate(鬼門關殺)
Day(日柱) Friend(比肩) Seven Killings(偏官) 長生(장생) Ghost Gate(鬼門關殺), Royal Authority(帝旺殺)
Hour(時柱) Friend(比肩) Direct Resource(正印) 帝旺(제왕) Heavenly Authority(天意星), Goat Blade(羊刃殺), Three Penalties(삼재)

Current Major Cycle (大運, age 41): Stem 丙 / Branch 寅, Stem Ten-God: Indirect Resource(偏印), Branch Ten-God: Seven Killings(偏官), 12-Phase: 長生(장생), Sinsal: Ghost Gate(鬼門關殺)

2026 Annual Cycle (歲運): Stem 丙 / Branch 午, Stem Ten-God: Indirect Resource(偏印), Branch Ten-God: Direct Resource(正印), Sinsal: Royal Authority(帝旺殺), Goat Blade(羊刃殺), Three Penalties(삼재)

This is the ground truth. Every downstream content generation step must reference only these values — no interpolation, no "typical Bazi patterns."


Pipeline Architecture

[Image Input]
     │
     ▼
[Claude Vision — Structured Extraction]
     │  (JSON schema with strict field names)
     ▼
[Verification Layer — Python]
     │  (cross-check Ten-God logic, flag anomalies)
     ▼
[Template Router]
     │  (channel selector: dev.to / Instagram / YouTube script / newsletter)
     ▼
[Prompt Chain — Content Generation]
     │  (domain-locked prompts, no hallucination surface)
     ▼
[QA Checker]
     │  (string match against extracted JSON, reject on mismatch)
     ▼
[Publisher API]
Enter fullscreen mode Exit fullscreen mode

Step 1: Claude Vision Extraction with a Strict Schema

The first and most critical step is extracting the Manse Calendar image into a validated JSON object. The prompt is not a general "describe this image" call. It is a schema-first extraction prompt:

SYSTEM: You are a Manse Calendar OCR agent. Extract ONLY what is visually present in the image. 
Do not infer, generalize, or apply Bazi theory. Output strictly valid JSON matching the provided schema.
If a field is not visible, output null. Never substitute a similar-looking character.

USER: Extract the following fields from this Manse Calendar image:
[schema provided as JSON Schema draft-07]
Enter fullscreen mode Exit fullscreen mode

The schema enforces:

  • stem_ten_god must be one of exactly 10 enum values (Korean hangul labels as they appear in the image)
  • branch_ten_god same enum
  • twelve_phase must be one of 12 enum values
  • sinsals is an array of strings, each matching a known sinsal label

The key insight: enum constraints on the schema prevent the model from "correcting" what it sees. If the image says 偏印 (Indirect Resource), the model cannot output 正印 (Direct Resource) because the enum validation would fail at parse time.

Extraction accuracy improved from ~74% (free-form prompt) to ~96% (schema-constrained prompt) in internal testing across 200 Manse Calendar images.


Step 2: The Verification Layer

Even with schema constraints, some errors slip through — particularly with visually similar hanja characters (e.g., 己/已/巳, or 戊/戌). The Python verification layer runs three checks:

Check A — Ten-God Consistency: Given the Day Master (日主) stem, the Ten-God relationships for all other stems are mathematically deterministic. The verifier recomputes expected Ten-Gods from the extracted Day Master and flags any mismatch.

Check B — Earthly Branch Cross-Reference: Each branch contains hidden stems (地藏干). The verifier checks that the branch Ten-God label is consistent with the dominant hidden stem's relationship to the Day Master.

Check C — Sinsal Presence Logic: Certain sinsals appear only under specific stem/branch combinations. The verifier flags sinsals that are structurally impossible given the extracted pillars.

Flagged records go to a human review queue rather than proceeding to content generation. This keeps the error rate in published content near zero.


Step 3: Prompt Strategy for Content Generation

Once the JSON is verified, content generation uses a domain-locked prompt pattern:

SYSTEM: You are a Bazi content writer. You MUST use only the data provided in the ![Saju chart sample](/_engine/assets/manseryeok/manse_sample.png) JSON block.
Do not apply general Bazi theory beyond what the data supports.
Do not make deterministic predictions. Use hedged language: "tends to," "may suggest," "the chart shows."
Prohibited phrases: "definitely," "certainly," "absolutely," "guaranteed."

![Saju chart sample](/_engine/assets/manseryeok/manse_sample.png): {verified_json}

USER: Write a [channel]-formatted article about [subject] focusing on [topic_trigger].
Topic trigger: mature actor, Direct Officer(正官) energy, Indirect Resource(偏印) pattern, 2026 Annual Cycle.
Target: ~1500 words. Channel: dev.to technical audience.
Enter fullscreen mode Exit fullscreen mode

The ![Saju chart sample](/_engine/assets/manseryeok/manse_sample.png) injection acts as a factual anchor. The model is instructed to quote field values directly rather than paraphrase them, which makes QA string-matching feasible in the next step.


Step 4: Channel-Specific Formatting

The same verified JSON feeds different formatters for different channels. For this pipeline, four formatters are implemented:

dev.to formatter: Frontmatter YAML, H2/H3 markdown headers, code-block-free body (domain content doesn't need code), INFO_GRAPHIC block rendered as a markdown table, CTA injected at position [9] before the closing section.

Instagram formatter: 2200-character limit, 5 hook lines, emoji density controlled (max 1 per paragraph), hashtag block auto-generated from sinsal and Ten-God labels.

YouTube script formatter: Cold open (30s), three act structure, B-roll cue annotations, outro CTA with timestamp.

Newsletter formatter: Plain text fallback, subject line A/B variants generated automatically, preheader extracted from the hook block.

Each formatter is a Python class with a render(verified_json, generated_text) -> str method. The channel router selects the formatter based on a config flag per content job.


Step 5: QA Checks Before Publish

The QA layer runs string-presence checks against the generated content:

  • All four pillar stems (己, 辛, 戊, 戊) must appear in the article at least once
  • Key Ten-God labels referenced in the topic trigger must appear verbatim
  • Prohibited phrases list is checked via regex
  • Word count must fall within ±10% of target
  • No Korean hangul characters in English-channel output (checked via Unicode range U+AC00–U+D7A3)

Articles failing any check are flagged and regenerated with an error-context prompt that includes the specific failure reason.


[INFO_GRAPHIC] — Reference Chart: Gong Yoo Manse Calendar Summary

Pillar Stem Branch Stem Ten-God Branch Ten-God 12-Phase
Year(年柱) Robber(劫財) Robber(劫財)
Month(月柱) Hurting Officer(傷官) Robber(劫財)
Day(日柱) Friend(比肩) Seven Killings(偏官) 長生
Hour(時柱) Friend(比肩) Direct Resource(正印) 帝旺
Major Cycle(大運, age 41) Indirect Resource(偏印) Seven Killings(偏官) 長生
2026 Annual(歲運) Indirect Resource(偏印) Direct Resource(正印)

Key Sinsals present: Ghost Gate(鬼門關殺) across Day/Hour/Major Cycle — structurally unusual concentration. Heavenly Noble(天乙貴人) in Year and Month. Goat Blade(羊刃殺) in Year and 2026. Royal Authority(帝旺殺) in 2026.


The Unexpected Finding: Ghost Gate(鬼門關殺) Clustering

During QA review of this reference case, the sinsal verifier flagged an anomaly: Ghost Gate(鬼門關殺) appears in the Day Pillar branch (寅), the Major Cycle branch (寅), and the Month Pillar branch (未) simultaneously. This is a branch-level triple resonance that the naive content generator initially ignored, defaulting to generic "creative sensitivity" language.

The domain-aware prompt revision added a specific instruction: when the verifier detects sinsal clustering above a threshold (3+ instances of the same sinsal across pillars and cycles), the content generator must explicitly address the structural pattern rather than treating each instance independently. This produced significantly more technically accurate content and reduced domain-expert rejection rate from 18% to 4% in review batches.

The broader lesson: domain anomalies are content opportunities, not edge cases to suppress.


Lessons Learned

1. Schema-first extraction beats prompt-first extraction. Defining the output schema before writing the extraction prompt forces you to enumerate all valid values, which eliminates a large class of hallucination errors.

2. Verification is cheaper than correction. Catching a Ten-God mismatch before generation costs one API call. Catching it after generation costs one generation call plus a regeneration call. Build the verifier first.

3. Channel formatters should be stateless renderers. Passing the same (verified_json, generated_text) tuple to any formatter makes the system easy to extend. Adding a new channel is adding a new class, not modifying the pipeline.

4. Prohibited phrase lists need domain customization. Generic "no hallucination" instructions don't prevent domain-specific overconfidence. Explicit prohibited phrases derived from domain norms (in this case, deterministic fortune-telling language) are necessary.

5. Human review queues are a feature, not a failure. Routing flagged records to human review rather than auto-rejecting them preserves edge cases that often reveal prompt improvements.


Summary

  • Claude Vision with schema-constrained prompts achieves ~96% extraction accuracy on structured domain images like Manse Calendars
  • A three-check verification layer (Ten-God consistency, branch cross-reference, sinsal logic) catches the remaining errors before generation
  • Channel-specific formatters as stateless renderers make multi-channel publishing a config change, not a code change
  • Sinsal clustering detection turned an edge case into a content differentiation signal

Explore the content output side of this pipeline at runartree.com


This article describes a technical content automation pipeline. All Saju/Bazi data referenced is used as a domain example only. No astrological predictions or life guidance are intended or implied. Individual results from any content automation system will vary based on data quality, prompt design, and domain expertise.


Project link

This article is based on an automated content workflow for a Korean Saju platform.

The key lesson is simple: generation alone is not enough. A useful publishing pipeline also needs formatting, QA, tracking links, and channel-specific editorial rules.


Bazi interpretation. Not medical, legal, or investment advice.

Top comments (0)