Retrieval-Augmented Generation (RAG) has become the default architecture for building AI-powered document intelligence systems. Most implementations follow the same pattern:
- Split documents into chunks
- Convert chunks into embeddings
- Store them in a vector database
- Retrieve the most similar chunks
- Send them to an LLM to generate answers
This pipeline works reasonably well for simple text. However, when applied to structured documents like clinical records, chunking can introduce serious problems.
Healthcare documents are rich with context and hierarchy. Breaking them into arbitrary chunks often leads to context loss, retrieval errors, and fragmented reasoning.
In this article, you will understand why chunking fails using a realistic clinical document example, and how structure-aware indexing and summarization can produce far better results.
Note - This post focuses on the Healthcare Domain with the patient clinical document as an example.
The Clinical Document Example
Consider the following clinical summary sample:
Patient Name: Jordan M.
DOB: 06/21/1990
Date of Summary: 08/01/2025
Diagnosis: F33.1 Major Depressive Disorder, recurrent, moderate
Symptoms: Persistent low mood, disrupted sleep, concentration issues
Treatment Summary:
- 12 CBT sessions, weekly
- Focused on core beliefs, behavioral activation
- PHQ-9 improved from 17 to 6
Medications: Sertraline 50mg daily, no side effects reported
Follow-Up Plan:
- Referral to psychiatrist for medication continuation
- Recommended ongoing biweekly therapy
At first glance, this document appears small, but clinical records in real systems often span hundreds of pages across multiple visits.
Even in this simple example, the document contains clear semantic sections:
Patient Info
Diagnosis
Symptoms
Treatment Summary
Medications
Follow-Up Plan
These sections provide the structure necessary for proper interpretation.
What Happens When We Chunk This Document
A traditional RAG system might split the text into chunks like this:
Chunk A
Patient Name: Jordan M.
DOB: 06/21/1990
Diagnosis: Major Depressive Disorder
Symptoms: Persistent low mood
Chunk B
Treatment Summary:
12 CBT sessions
PHQ-9 improved from 17 to 6
Chunk C
Medications: Sertraline 50mg daily
Follow-Up Plan: referral to psychiatrist
1. Cross-Section Reasoning Questions
These require information from multiple chunks, which chunk-based retrieval often fails to assemble.
Example Questions
• What treatment improved the patient’s PHQ-9 score?
• What medication is being used to treat the patient's depression?
• What treatment approach was used along with medication?
• What interventions helped reduce the patient’s depression score?
Why Chunking Fails
The system may retrieve:
Chunk B
PHQ-9 improved from 17 to 6
But it does not contain medication information, so the answer becomes incomplete.
2. Contextual Medical Questions
These questions require understanding relationships between sections.
Example Questions
• What condition is the patient being treated for with Sertraline?
• Why was the patient referred to a psychiatrist?
• What symptoms led to the treatment plan?
Why Chunking Fails
Chunk C contains medication, but diagnosis is in Chunk A, so the model may not connect them.
3. Treatment Outcome Questions
These require linking treatment with outcomes.
Example Questions
• Did the therapy sessions improve the patient’s condition?
• What evidence shows the patient improved during treatment?
• How effective was the treatment plan?
Why Chunking Fails
The improvement metric:
PHQ-9 improved from 17 to 6
appears in Chunk B, but the context about depression diagnosis is in Chunk A.
4. Follow-Up Care Questions
These require understanding treatment history and next steps.
Example Questions
• Why does the patient need psychiatric follow-up?
• What follow-up care is recommended after treatment?
• What ongoing care is suggested for this patient?
Why Chunking Fails
Chunk C contains the follow-up plan but not the context of the diagnosis or therapy outcome.
5. Comprehensive Clinical Summary Questions
These require multiple chunks simultaneously.
Example Questions
• Summarize the patient’s diagnosis, treatment, and follow-up plan.
• What treatments has the patient received for depression?
• What is the overall care plan for this patient?
Why Chunking Fails
Chunk-based retrieval may only return one chunk, causing a partial summary.
Example incomplete retrieval:
Chunk B
Treatment Summary
12 CBT sessions
PHQ-9 improved from 17 to 6
But the system misses medication and follow-up care.
6. Ambiguous Retrieval Questions
These expose semantic similarity issues in vector search.
Example Questions
• What therapy is the patient receiving?
• What treatment is the patient undergoing?
• How is the patient being treated?
Vector search may retrieve:
Chunk B
Treatment Summary
But it misses medication in Chunk C, which is also part of the treatment plan.
Vector similarity measures semantic proximity, not clinical context.
The result: incorrect or incomplete answers.
Why Chunking Breaks Clinical Documents
Healthcare documents illustrate several fundamental problems with chunking.
1. Clinical Context Gets Fragmented
Clinical notes often rely on relationships between sections.
Example:
Diagnosis - Explains why treatment was prescribed
Treatment - Explains how symptoms improved
Follow-Up - Explains ongoing care
When chunked, these relationships disappear.
2. Important Meaning Spans Sections
Consider the treatment outcome:
PHQ-9 improved from 17 to 6
This metric only makes sense if the model also understands:
Diagnosis: Major Depressive Disorder
Treatment: CBT sessions
Medication: Sertraline
Chunking separates these connected ideas.
3. Clinical Reasoning Requires Structure
Doctors interpret records by navigating sections:
Diagnosis
Symptoms
Treatment
Medication
Follow-Up
Chunking ignores this hierarchy entirely.
A Better Approach: Structure-Aware Document Retrieval
Instead of splitting documents arbitrarily, the document structure can be preserved by producing a tree based hierarchical structure.
Example hierarchical representation:
Clinical Summary
├ Patient Information
│ ├ Name
│ ├ DOB
│
├ Diagnosis
│
├ Symptoms
│
├ Treatment Summary
│
├ Medications
│
└ Follow-Up Plan
Each section becomes a retrieval node.
This structure preserves the clinical context.
Adding Summarization for Better Retrieval
To improve retrieval efficiency, each section can be summarized.
Example summaries:
Patient Information
Summary: Patient demographics including name and DOB.
Diagnosis
Summary: Major Depressive Disorder (recurrent, moderate).
Treatment Summary
Summary: 12 CBT sessions with significant improvement in PHQ-9 score.
Medications
Summary: Sertraline 50mg daily with no reported side effects.
Follow-Up Plan
Summary: Referral to psychiatrist and continued biweekly therapy.
These summaries act as compressed semantic representations of the document.
How Retrieval Works with Summaries
User query:
"What medication is the patient currently taking?"
The system compares the query to section summaries:
Diagnosis - Mental health condition
Treatment - Therapy sessions
Medications - Drug prescription
Follow-Up - Future care
The correct section (Medications) is retrieved immediately.
Example Final Context
Retrieved section:
Medications:
Sertraline 50mg daily, no side effects reported
Generated response:
The patient is currently prescribed Sertraline 50mg daily, with no reported side effects.
High-level Architecture for Clinical RAG
A structure-aware system might follow this pipeline:
This preserves meaning while reducing noise.
Why This Matters in Healthcare AI
Clinical AI systems must prioritize:
• Accuracy
• Traceability
• Context awareness
Chunk-based retrieval often struggles to meet these requirements.
Structure-aware approaches provide:
Higher precision
Relevant sections are retrieved instead of unrelated chunks.
Better explainability
The system can show exact sections used in reasoning.
Improved clinical safety
Maintaining document hierarchy reduces the risk of misinterpretation.
The Future of RAG in Healthcare
As AI becomes more integrated into healthcare systems, document understanding will play a critical role.
The next generation of RAG architectures will likely include:
• Hierarchical document indexing
• Section-level summarization
• Reasoning-based retrieval
• Agentic document exploration
These approaches allow AI systems to navigate clinical documents more like human experts.
Conclusion
The chunking assumes documents are bags of paragraphs. But documents are actually structured knowledge systems. Even when documents appear unstructured, the structure can be inferred. And once structure exists, retrieval becomes far more accurate.
Structured documents like clinical records, it often causes more problems than it solves.
If you need the AI systems to truly understand documents, in such cases preserving the structure and allow models to reason over meaningful sections is really crucial.
Moving beyond chunking is a critical step toward building safer, more reliable document intelligence systems.
In the next blog posts, you will be walked with a realistic example on how to deal with the unstructured data and its retrieval.
Attribution
Clinical document sample was referenced from https://www.supanote.ai/templates/clinical-summary-template
This blog-post contents were formatted with ChatGPT to make it more professional and produce a polished content for the targeted audience.

Top comments (5)
The clinical document example makes the failure mode concrete in a way that generic RAG critiques rarely do. The PHQ-9 metric only means something in context of the diagnosis and the treatment. Chunking strips that context and the retrieval system has no way to recover it.
There is an interesting tension between this post and Ayan Arshad's chunking experiments published on Dev.to yesterday. His conclusion for code was that smaller chunks win, function-level AST extraction at roughly 120 tokens outperformed larger windows. Your conclusion for clinical documents is that larger semantic units win, the section, not the paragraph or the sentence.
These are not contradictory. They are both right for their domains, and together they point at something more precise than "chunking is a mistake." The optimal granularity is determined by two variables simultaneously: the structure of the data AND the nature of the question being asked.
For code, the question is usually "what does this function do", a bounded, function-scoped query. The semantic unit is the function. For clinical records, the question is often "what explains this treatment decision", a relational query that spans sections. The semantic unit is the relationship between sections, not any individual section.
The section-level summarization approach you describe solves the navigation problem elegantly. The risk worth naming is that it introduces a precision-recall tradeoff at the leaf level. A summary of the Medications section that says "Sertraline 50mg daily" is perfect for the query "what medication is the patient taking." But for the query "was there any adverse reaction noted in the medication review," the summary may not preserve that granularity, and the raw chunk would have been more precise.
The architecture that handles both is two-stage: section summaries for coarse navigation and relevance filtering, then raw chunk retrieval within the matched section for precise extraction. The summary routes the query to the right section. The chunk answers it. This avoids the over-generalisation risk in summary-only retrieval while preserving the cross-section context that chunking alone loses.
The compliance domain has exactly the same cross-section reasoning problem as clinical records. A question like "what legal basis justifies this data processing" requires connecting the stated purpose (one section), the legal basis (another section), and the retention policy (a third section). The same hierarchical structure-aware approach applies directly.
Great explanation Olebeng. It's inspiring to see your in-depth analysis on this topic. There is an on-going research by PageIndex company, they are doing it with a completely different approach by vector less. Yes, we need to go case by case basis. I wish you may also like my other blog post - dev.to/ranjancse/a-vectorless-rag-...
I am currently grappling with this issue with my current build and conversations like these help put concepts into perspective. I will definitely give your other blog a read as well.
Nice explanation, hadn't thought about this, thanks for the blog :)
I am glad you liked it :)