Sanjay Ghosh

Posted on Apr 1

Build a RAG Pipeline in Java (Text Vector LLM, No Paid APIs)

#ai #machinelearning #postgres #rag

Ever asked an LLM a question about your own data and received an incorrect or generic answer?

That’s because Large Language Models (LLMs) don’t know your private data.

In this article, we’ll build a complete Retrieval-Augmented Generation (RAG) pipeline using:

Java
PostgreSQL (with vector support)
Ollama (local LLM + embeddings)

👉 No OpenAI / No paid APIs
👉 Fully local
👉 Practical and production-relevant

🧠 What is RAG?

Retrieval-Augmented Generation (RAG) is an architecture that improves LLM responses by:

Retrieving relevant data from a knowledge source
Passing that data to the LLM
Generating an answer grounded in that context

In simple terms:

Instead of guessing, the model first looks up relevant information and then answers.

🔍 Why Do We Need RAG?

LLMs are powerful, but they have limitations:

❌ They don’t know your private/company data
❌ Their knowledge is static
❌ They can hallucinate

RAG solves this by combining:

Your data (database)
Smart retrieval (vector search)
LLM reasoning (generation)

📊 RAG Flow (This Project)

We will implement this pipeline:
Text → Embedding → Store in DB

Query → Embedding
↓
Vector Search (Top K)
↓
Pass to LLM
↓
Final Answer

⚙️ Prerequisites

1. Install PostgreSQL

Make sure PostgreSQL is installed and running.

2. Install Ollama (Local LLM)

sudo apt-get install zstd
curl -fsSL https://ollama.com/install.sh | sh

3. Pull Required Models

# LLM (for answer generation)
ollama pull llama3

# Embedding model
ollama pull nomic-embed-text

4. Verify Installation

ollama run llama3
If it responds, you’re ready.

🟢 Phase 1: Indexing (Store Data)

In this phase, we:

Convert text → vector (embedding)
Store it in PostgreSQL

Why Embeddings?

Embeddings convert text into numbers so we can measure similarity.
Example:
"OAuth authentication" → [0.12, -0.98, 0.45, ...]

Database Table

CREATE TABLE text_embeddings ( id SERIAL PRIMARY KEY, content TEXT, embedding VECTOR(768) );

Key Class: EmbeddingService.java

Calls Ollama
Converts text → vector

Snippet

ClassicHttpResponse response = (ClassicHttpResponse) Request.post("http://localhost:11434/api/embeddings")
                .bodyString(body.toString(), ContentType.APPLICATION_JSON)
                .execute()
                .returnResponse();

This returns a numerical vector representation of the input text, which we store in the database.

Key Class: StorageService.java

Stores text + embedding into PostgreSQL

PreparedStatement ps = conn.prepareStatement(
            "INSERT INTO text_embeddings (content, embedding) VALUES (?, ?::vector)"
        );

        ps.setString(1, text);
        ps.setString(2, vector);

        ps.executeUpdate();

Each piece of text is stored along with its vector representation.

🔵 Phase 2: Query (RAG Flow)

Step 1: User Query

"What is OAuth?"

Step 2: Convert Query → Embedding

Same process as storing text.

Step 3: Retrieve Relevant Data

SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT 3;
👉 This finds the most similar text chunks

Key Class: Retriever.java

This is the R (Retrieval) in RAG.

        PreparedStatement ps = conn.prepareStatement(
            """
            SELECT content
            FROM text_embeddings
            ORDER BY embedding <-> ?::vector
            LIMIT ?
            """
        );

        ps.setString(1, vector);
        ps.setInt(2, topK);

        ResultSet rs = ps.executeQuery();

Step 4: Generate Answer Using LLM

We pass retrieved data to the LLM:

Context:
OAuth 2.0 is an authorization framework...

Question:
What is OAuth?

👉 The LLM generates a clean answer.

Key Class: LLMService.java

This is the G (Generation) in RAG.
Passing Context to the LLM

        String prompt = """
                        Answer briefly in 2-3 sentences.

                        Context:
                        %s

                        Question:
                        %s
                        """.formatted(context, query);

We inject retrieved data into the prompt so the LLM generates grounded answers.

🧪 Sample Output
--- Retrieved Context ---
OAuth 2.0 is an authorization framework.
JWT is used for secure authentication.

--- Final Answer ---
OAuth is an authorization framework used to grant access to resources...
🧠 What’s Really Happening?

This is the most important part to understand:

Component Role
Database Stores knowledge
Retriever Finds relevant information
LLM Generates answer

👉 The LLM does NOT retrieve data
👉 The database does NOT generate answers

💻 Full Code

The project includes:

EmbeddingService.java
StorageService.java
Retriever.java
LLMService.java
RAGApp.java
pom.xml

👉 GitHub Repository

https://github.com/knowledgebase21st/Software-Engineering/tree/dev/AI/RAG

🚀 Why This Approach is Powerful

Works with your own data
Reduces hallucination
Fully offline (with Ollama)
Production-ready pattern

✅ Conclusion

We built a complete RAG pipeline using Java, PostgreSQL, and Ollama.

This approach combines:

Your data
Smart retrieval
LLM reasoning

Result:
Accurate, context-aware answers using your own knowledge base.

Top comments (2)

SOUMYA KANTI GHOSH • Apr 3

It would have been better if you could use spring data jpa for DB operations.

Sanjay Ghosh • Apr 6

Good point—Spring Data JPA would make the data layer cleaner. I kept it simple here to focus on the RAG pipeline and pgvector integration, but for production I’d definitely consider JPA or a hybrid approach for better maintainability.