DEV Community

Cover image for Build a RAG Pipeline in Java (Text Vector LLM, No Paid APIs)
Sanjay Ghosh
Sanjay Ghosh

Posted on

Build a RAG Pipeline in Java (Text Vector LLM, No Paid APIs)

Ever asked an LLM a question about your own data and received an incorrect or generic answer?

Thatโ€™s because Large Language Models (LLMs) donโ€™t know your private data.

In this article, weโ€™ll build a complete Retrieval-Augmented Generation (RAG) pipeline using:

  • Java
  • PostgreSQL (with vector support)
  • Ollama (local LLM + embeddings)

๐Ÿ‘‰ No OpenAI / No paid APIs
๐Ÿ‘‰ Fully local
๐Ÿ‘‰ Practical and production-relevant

๐Ÿง  What is RAG?

Retrieval-Augmented Generation (RAG) is an architecture that improves LLM responses by:

Retrieving relevant data from a knowledge source
Passing that data to the LLM
Generating an answer grounded in that context

In simple terms:

Instead of guessing, the model first looks up relevant information and then answers.

๐Ÿ” Why Do We Need RAG?

LLMs are powerful, but they have limitations:

  • โŒ They donโ€™t know your private/company data
  • โŒ Their knowledge is static
  • โŒ They can hallucinate

RAG solves this by combining:

  • Your data (database)
  • Smart retrieval (vector search)
  • LLM reasoning (generation)

๐Ÿ“Š RAG Flow (This Project)

We will implement this pipeline:
Text โ†’ Embedding โ†’ Store in DB

Query โ†’ Embedding
โ†“
Vector Search (Top K)
โ†“
Pass to LLM
โ†“
Final Answer

โš™๏ธ Prerequisites

1. Install PostgreSQL

Make sure PostgreSQL is installed and running.

2. Install Ollama (Local LLM)

sudo apt-get install zstd
curl -fsSL https://ollama.com/install.sh | sh

3. Pull Required Models

# LLM (for answer generation)
ollama pull llama3

# Embedding model
ollama pull nomic-embed-text

4. Verify Installation

ollama run llama3
If it responds, youโ€™re ready.

๐ŸŸข Phase 1: Indexing (Store Data)

In this phase, we:

  1. Convert text โ†’ vector (embedding)
  2. Store it in PostgreSQL

Why Embeddings?

Embeddings convert text into numbers so we can measure similarity.
Example:
"OAuth authentication"
โ†’ [0.12, -0.98, 0.45, ...]

Database Table

CREATE TABLE text_embeddings (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(768)
);

Key Class: EmbeddingService.java

Calls Ollama
Converts text โ†’ vector

Snippet

ClassicHttpResponse response = (ClassicHttpResponse) Request.post("http://localhost:11434/api/embeddings")
                .bodyString(body.toString(), ContentType.APPLICATION_JSON)
                .execute()
                .returnResponse();
Enter fullscreen mode Exit fullscreen mode

This returns a numerical vector representation of the input text, which we store in the database.

Key Class: StorageService.java

Stores text + embedding into PostgreSQL

PreparedStatement ps = conn.prepareStatement(
            "INSERT INTO text_embeddings (content, embedding) VALUES (?, ?::vector)"
        );

        ps.setString(1, text);
        ps.setString(2, vector);

        ps.executeUpdate();
Enter fullscreen mode Exit fullscreen mode

Each piece of text is stored along with its vector representation.

๐Ÿ”ต Phase 2: Query (RAG Flow)

Step 1: User Query

"What is OAuth?"

Step 2: Convert Query โ†’ Embedding

Same process as storing text.

Step 3: Retrieve Relevant Data

SELECT content
FROM text_embeddings
ORDER BY embedding <-> ?::vector
LIMIT 3;
๐Ÿ‘‰ This finds the most similar text chunks

Key Class: Retriever.java

This is the R (Retrieval) in RAG.

        PreparedStatement ps = conn.prepareStatement(
            """
            SELECT content
            FROM text_embeddings
            ORDER BY embedding <-> ?::vector
            LIMIT ?
            """
        );

        ps.setString(1, vector);
        ps.setInt(2, topK);

        ResultSet rs = ps.executeQuery();
Enter fullscreen mode Exit fullscreen mode

Step 4: Generate Answer Using LLM

We pass retrieved data to the LLM:

Context:
OAuth 2.0 is an authorization framework...

Question:
What is OAuth?

๐Ÿ‘‰ The LLM generates a clean answer.

Key Class: LLMService.java

This is the G (Generation) in RAG.
Passing Context to the LLM

        String prompt = """
                        Answer briefly in 2-3 sentences.

                        Context:
                        %s

                        Question:
                        %s
                        """.formatted(context, query);
Enter fullscreen mode Exit fullscreen mode

We inject retrieved data into the prompt so the LLM generates grounded answers.

๐Ÿงช Sample Output
--- Retrieved Context ---
OAuth 2.0 is an authorization framework.
JWT is used for secure authentication.

--- Final Answer ---
OAuth is an authorization framework used to grant access to resources...
๐Ÿง  Whatโ€™s Really Happening?

This is the most important part to understand:

Component Role
Database Stores knowledge
Retriever Finds relevant information
LLM Generates answer

๐Ÿ‘‰ The LLM does NOT retrieve data
๐Ÿ‘‰ The database does NOT generate answers

๐Ÿ’ป Full Code

The project includes:

  • EmbeddingService.java
  • StorageService.java
  • Retriever.java
  • LLMService.java
  • RAGApp.java
  • pom.xml

๐Ÿ‘‰ GitHub Repository

https://github.com/knowledgebase21st/Software-Engineering/tree/dev/AI/RAG

๐Ÿš€ Why This Approach is Powerful

  • Works with your own data
  • Reduces hallucination
  • Fully offline (with Ollama)
  • Production-ready pattern

โœ… Conclusion

We built a complete RAG pipeline using Java, PostgreSQL, and Ollama.

This approach combines:

  • Your data
  • Smart retrieval
  • LLM reasoning

Result:
Accurate, context-aware answers using your own knowledge base.

Top comments (2)

Collapse
ย 
soumygho profile image
SOUMYA KANTI GHOSH โ€ข

It would have been better if you could use spring data jpa for DB operations.

Collapse
ย 
sanjayghosh profile image
Sanjay Ghosh โ€ข

Good pointโ€”Spring Data JPA would make the data layer cleaner. I kept it simple here to focus on the RAG pipeline and pgvector integration, but for production Iโ€™d definitely consider JPA or a hybrid approach for better maintainability.