DEV Community

Harideevagan M
Harideevagan M

Posted on

How I Built a RAG-Powered Conversational Assistant for Odoo ERP

Every enterprise runs on data — sales orders, invoices, inventory counts, customer records — but getting answers from that data inside an ERP system usually means clicking through 10 screens, running reports, or asking someone who knows which menu hides which number.
I wanted to change that. As a Full-Stack Developer at Futurenet Technologies in Chennai, I built a conversational AI assistant that sits inside Odoo ERP and lets users ask questions in plain English (or give voice commands) and get instant answers from live business data.
This is the story of how I built it, the architecture behind it, and the lessons I learned along the way.

The Problem
Our enterprise clients were running Odoo ERP with 15+ modules — Sales, Inventory, MRP, Accounting, CRM, HR, Payroll, POS, E-commerce, Helpdesk, and more. The data was all there, but:

Sales reps had to navigate 4-5 screens just to check a customer's credit status
Warehouse managers needed to run reports to see stock availability
Executives wanted quick KPIs without waiting for BI dashboards to load
Field service agents needed hands-free access while on-site

The question was simple: Can users just ask the ERP what they need and get an answer?

The Architecture
Here's the high-level architecture I designed:
User Query (text/voice/image)


┌─────────────────────┐
│ Multimodal Input │ ← Speech-to-text, OCR, text input
│ Processing Layer │
└────────┬────────────┘


┌─────────────────────┐
│ Query Router │ ← Classifies intent and complexity
└────────┬────────────┘

┌────┴────┐
│ │
▼ ▼
┌────────┐ ┌──────────────┐
│ Direct │ │ RAG Pipeline │
│ ORM │ │ (LangChain + │
│ Query │ │ pgvector) │
└───┬────┘ └──────┬───────┘
│ │
▼ ▼
┌─────────────────────┐
│ Response Generator │ ← Fine-tuned LLM
│ (QLoRA Model) │
└────────┬────────────┘


┌─────────────────────┐
│ Odoo ERP Frontend │ ← OWL widget in Odoo UI
└─────────────────────┘
The system has four main components:

  1. Multimodal Input Processing Users can interact through text, voice, or image:

Text: Direct chat input in the Odoo interface
Voice: Speech-to-text using Whisper, enabling hands-free operation for warehouse and field workers
Image: OCR processing for documents — snap a photo of a purchase order and the system extracts data

pythonclass MultimodalProcessor:
def init(self):
self.whisper_model = whisper.load_model("base")
self.ocr_engine = PaddleOCR(use_angle_cls=True, lang='en')

def process_input(self, input_data, input_type="text"):
    if input_type == "voice":
        result = self.whisper_model.transcribe(input_data)
        return result["text"]
    elif input_type == "image":
        result = self.ocr_engine.ocr(input_data, cls=True)
        extracted = " ".join([line[1][0] for line in result[0]])
        return extracted
    return input_data
Enter fullscreen mode Exit fullscreen mode
  1. RAG Pipeline with LangChain and pgvector This is the core of the system. Instead of feeding the entire database to the LLM (impossible and expensive), I built a Retrieval-Augmented Generation pipeline: Step 1: Document Indexing I created embeddings for Odoo's business data — product descriptions, customer notes, helpdesk tickets, sales policies, HR policies, and module documentation: pythonfrom langchain.embeddings import HuggingFaceEmbeddings from langchain.vectorstores import PGVector

Using a lightweight embedding model for speed

embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)

Store embeddings in PostgreSQL using pgvector

vector_store = PGVector(
connection_string=DATABASE_URL,
embedding_function=embeddings,
collection_name="odoo_knowledge_base"
)
Why pgvector instead of ChromaDB or Pinecone?
Since Odoo already runs on PostgreSQL, using pgvector meant:

No additional infrastructure to manage
Embeddings live alongside the business data
Transactions are ACID-compliant
One backup strategy covers everything

Step 2: Smart Retrieval
Not every query needs RAG. "What's the stock of Product X?" can be answered directly from the ORM. But "What's our return policy for international orders?" needs the RAG pipeline.
I built a query router that classifies intent:
pythonclass QueryRouter:
"""Routes queries to the appropriate handler."""

DIRECT_ORM_PATTERNS = [
    "stock", "quantity", "price", "total", 
    "count", "balance", "status"
]

def route(self, query: str) -> str:
    query_lower = query.lower()

    # Check if this can be answered via direct ORM query
    if any(pattern in query_lower for pattern in self.DIRECT_ORM_PATTERNS):
        return "orm_direct"

    # Check if this needs document retrieval
    return "rag_pipeline"
Enter fullscreen mode Exit fullscreen mode

Step 3: Context-Aware Retrieval
The retrieval step doesn't just do a simple similarity search. It considers the user's context — their role, department, and the Odoo module they're currently in:
pythondef retrieve_context(self, query, user_context):
"""Retrieve relevant documents with user context awareness."""

# Build metadata filter based on user's access rights
metadata_filter = {
    "department": user_context.get("department"),
    "module": user_context.get("active_module"),
}

# Retrieve top-k relevant documents
docs = self.vector_store.similarity_search(
    query,
    k=5,
    filter=metadata_filter
)

# Also fetch live data from Odoo ORM if needed
orm_context = self._fetch_orm_data(query, user_context)

return docs, orm_context
Enter fullscreen mode Exit fullscreen mode
  1. Fine-Tuned LLM with QLoRA The base model didn't understand Odoo-specific terminology or our clients' business logic. So I fine-tuned it using QLoRA (Quantized Low-Rank Adaptation): Why QLoRA?

Full fine-tuning of a 7B parameter model needs 28+ GB VRAM
QLoRA reduces this to under 8 GB by quantizing to 4-bit and training only low-rank adapters
We could run this on a single GPU server

pythonfrom peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

4-bit quantization config

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)

Load base model with quantization

model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.2",
quantization_config=bnb_config,
device_map="auto"
)

LoRA configuration

lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, lora_config)
Training Data
I curated a dataset of 5,000+ examples from:

Real Odoo support tickets and their resolutions
ERP operation workflows (how to create a PO, how to check stock, etc.)
Business-specific Q&A pairs from our clients
Odoo documentation rewritten as conversational Q&A

  1. Integration with Odoo The assistant lives inside Odoo as an OWL (Odoo Web Library) component — a chat widget accessible from any screen: javascript/** @odoo-module */ import { Component, useState } from "@odoo/owl";

export class ERPAssistant extends Component {
static template = "erp_assistant.ChatWidget";

setup() {
    this.state = useState({
        messages: [],
        isListening: false,
        isProcessing: false,
    });
}

async sendMessage(query) {
    this.state.isProcessing = true;

    const response = await this.env.services.rpc("/api/assistant/query", {
        query: query,
        context: {
            active_model: this.env.config.activeModel,
            active_id: this.env.config.activeId,
            user_id: this.env.services.user.userId,
        }
    });

    this.state.messages.push({
        role: "assistant",
        content: response.answer,
        sources: response.sources,
    });

    this.state.isProcessing = false;
}
Enter fullscreen mode Exit fullscreen mode

}

Challenges and Lessons Learned

  1. Keeping embeddings in sync with live data
    Odoo data changes constantly — new products, updated prices, modified policies. I built a scheduled action that re-indexes changed records every hour:
    pythonclass EmbeddingSync(models.Model):
    _name = 'ai.embedding.sync'

    @api.model
    def _cron_sync_embeddings(self):
    """Sync modified records to vector store."""
    last_sync = self._get_last_sync_time()

    # Find all records modified since last sync
    modified_records = self.env['ir.model'].search([
        ('write_date', '>', last_sync)
    ])
    
    for record in modified_records:
        self._update_embedding(record)
    
  2. Hallucination control
    The LLM sometimes generated plausible-sounding but incorrect numbers. My solution:

Always verify numerical answers against the ORM before returning them
Include source citations in every response so users can verify
Add a confidence score — if the model isn't confident, it says "I'm not sure, let me show you the relevant screen instead" and navigates the user to the right Odoo view

  1. Response latency Enterprise users expect instant answers. My optimizations:

Cached embeddings for frequently accessed data (top products, common policies)
Streaming responses so users see the answer forming in real-time
Query classification to route simple queries directly to the ORM (under 200ms) instead of the full RAG pipeline (2-3 seconds)

Results
After deploying across multiple enterprise clients:

Data processing time significantly reduced — users get answers in seconds instead of navigating multiple screens
Voice command adoption grew rapidly — warehouse workers loved the hands-free operation
Strong user adoption within weeks of launch — users preferred chatting over navigating menus
Helpdesk ticket volume dropped — employees could self-serve common questions

Tech Stack Summary
ComponentTechnologyLLMMistral 7B + QLoRA fine-tuningRAG FrameworkLangChainVector Storepgvector (PostgreSQL)Embeddingssentence-transformers/all-MiniLM-L6-v2Speech-to-TextWhisperOCRPaddleOCRERPOdoo (v17)BackendPython, FastAPIFrontendOWL (Odoo Web Library)InfrastructureGPU server, Docker, Nginx

What I Would Do Differently

Start with a smaller model — I initially tried a 13B model but 7B with good fine-tuning performed just as well for this domain-specific use case, at half the inference cost.
Invest more in evaluation — I should have built an automated eval pipeline earlier. Manual testing doesn't scale when you have thousands of possible queries.
Hybrid search from day one — Combining vector similarity search with traditional keyword search (BM25) would have improved retrieval accuracy for queries containing specific product codes or order numbers.

Wrapping Up
Building an AI assistant for an ERP system is fundamentally different from building a general-purpose chatbot. The data is structured, the answers need to be precise, and users have zero tolerance for wrong numbers.
The combination of RAG (for knowledge retrieval), QLoRA fine-tuning (for domain understanding), and deep Odoo integration (for real-time data access) made this possible without requiring massive infrastructure.
If you're building something similar or have questions about any part of this architecture, feel free to reach out — I'd love to chat about it.

I'm Harideevagan M, a Full-Stack Developer and AI Engineer at Futurenet Technologies in Chennai, India. I specialize in LLM engineering, Odoo ERP, and enterprise automation. Check out my portfolio at harideevagan.netlify.app or connect with me on LinkedIn and GitHub.

Top comments (0)