β‘ From NumPy to FAISS: Making ChatPDF Fast & Scalable
In Part 1, we made it work.
In Part 2, we make it usable π
π Recap from Part 1
In Part 1, we built a ChatPDF app using:
- PDF β Text β Chunks
- Embeddings using Ollama
- Similarity search using NumPy
- LLM to generate answers
It worked well for small PDFs and helped us understand RAG from first principles.
But once I started testing with slightly larger PDFsβ¦
π The Problem Started Showing Up
The issue was not correctness β it was performance.
Letβs revisit what we were doing during search.
β NumPy Search (What we had before)
similarities = np.dot(vector_db, query_vector)
top_indices = np.argsort(similarities)[-TOP_K:][::-1]
π§ Whatβs actually happening here?
Every time you ask a question:
- Compute similarity with every chunk
- Store all similarity scores
- Sort the entire list
- Pick top K
π¨ Why this becomes a problem
- Time complexity β O(n) per query
- More chunks = slower search
- Entire dataset scanned every time
To make this visible, I added timing:
start_time = time.perf_counter()
# similarity logic
end_time = time.perf_counter()
print(f"Total time with numpy: {execution_time}")
And as the number of chunks increasedβ¦
β³ the delay became noticeable.
π‘ So Whatβs the Solution?
Instead of:
βSearch through everything every timeβ
We need:
βA system that knows where to lookβ
π Letβs Visualize the Problem (This is the key moment)
π This is where the real difference becomes obvious:
π‘ What this diagram shows:
- NumPy β scans every single chunk
- FAISS β directly jumps to the most relevant results
This is the exact shift from:
brute force β intelligent retrieval
π Introducing FAISS
FAISS (Facebook AI Similarity Search) is built for:
- Fast vector similarity search
- Efficient indexing
- Handling large datasets
The key idea:
π Build an index once β search efficiently many times
π Step 1: Moving from Raw Vectors β FAISS Index
β Before (NumPy mindset)
We stored vectors like this:
vector_db = np.array(vectors, dtype=np.float32)
Thatβs it.
No structure. No optimization. Just raw data.
β After (FAISS approach)
vector_np = np.array(vectors).astype('float32')
dimension = vector_np.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(vector_np)
π§ Letβs understand this properly
1οΈβ£ Converting to float32
vector_np = np.array(vectors).astype('float32')
FAISS requires vectors in float32.
Even if your embeddings are already floats, doing this ensures:
- Compatibility
- No runtime surprises
2οΈβ£ Getting the dimension
dimension = vector_np.shape[1]
Each embedding looks like:
[0.12, -0.45, 0.88, ...]
The number of elements = dimension
FAISS needs this to build the index correctly.
3οΈβ£ Creating the index
index = faiss.IndexFlatIP(dimension)
-
IndexFlatIPβ Inner Product search - Since embeddings are normalized β π Inner Product β Cosine Similarity
So we are essentially saying:
βStore these vectors and allow fast similarity-based search.β
4οΈβ£ Adding vectors to FAISS
index.add(vector_np)
This step:
- Loads all embeddings into FAISS
- Builds the internal structure
π From here, we stop thinking in terms of arrays and start thinking in terms of an index
π― Big Concept Shift
| NumPy | FAISS |
|---|---|
| Raw vectors | Indexed vectors |
| Manual search | Optimized search |
| Full scan | Smart retrieval |
π Step 2: Searching with FAISS
β Before (NumPy)
similarities = np.dot(vector_db, query_vector)
top_indices = np.argsort(similarities)[-TOP_K:][::-1]
β After (FAISS)
distances, indices = index.search(query_vector.reshape(1, -1), k=TOP_K)
π§ Letβs break this down
1οΈβ£ Why reshape?
query_vector.reshape(1, -1)
FAISS expects:
[number_of_queries, dimension]
Even a single query must be shaped like:
[[embedding]]
2οΈβ£ What does search() do?
distances, indices = index.search(...)
FAISS:
- Finds nearest vectors
- Sorts internally
- Returns top K
3οΈβ£ Mapping results back
[text_metadata[i] for i in indices[0]]
We use indices to fetch:
- Actual text chunks
- Page numbers
π‘ Why this is powerful
Instead of:
- Writing similarity logic β
- Writing sorting logic β
You now:
π Call one optimized function
πΎ Step 3: Avoid Recomputing Everything
π¨ Problem in Part 1
Every run:
- Read PDF
- Chunk text
- Generate embeddings
- Build vectors
β Solution: Save the Index
faiss.write_index(index, "db/index.faiss")
with open("db/metadata.pkl", "wb") as f:
pickle.dump(data, f)
π§ What are we saving?
- FAISS index β vector structure
- Metadata β chunk + page info
- PDF hash β detect changes
π Loading instead of recomputing
index = faiss.read_index("db/index.faiss")
Now:
- β‘ Faster startup
- β No repeated embedding calls
π Step 4: Detecting PDF Changes
def calculate_pdf_hash():
sha256_hash = hashlib.sha256()
π§ Why this matters
If the PDF changes:
- Old embeddings become invalid
So we:
- Generate hash
- Compare with stored hash
- Rebuild only if needed
π Small addition, big impact.
π₯ Step 5: Improving Retrieval with Re-ranking
Even FAISS isnβt perfect.
So we add another layer:
results = ranker.rerank(rerank_request)
π§ Whatβs happening here?
- FAISS retrieves top 10 chunks
- Re-ranker evaluates relevance
- Returns best TOP_K
π Debug visibility
print("--- Re-ranker Scores ---")
Helps you:
- Understand ranking
- Debug results
π¬ Step 6: Streaming Responses (UX Upgrade)
for chunk in generate_answer(user_query, context_llm):
print(chunk['response'], end='', flush=True)
π§ Why this matters
- Feels real-time
- Improves perceived speed
- Better experience
π Final System (Letβs Visualize It)
π This is what your ChatPDF system looks like now:
π§ What this diagram represents
- Query β converted into embedding
- FAISS β retrieves relevant chunks
- Re-ranker β improves quality
- LLM β generates final answer
π This is a complete RAG pipeline
π What We Achieved
| Feature | Part 1 (NumPy) | Part 2 (FAISS) |
|---|---|---|
| Search | Brute force | Indexed β‘ |
| Speed | Slow | Fast |
| Persistence | β | β |
| Accuracy | Basic | Improved |
| UX | Basic | Streaming |
π§ Final Thoughts
This is where things became real.
From βI understand RAGβ
to
βI can build something scalableβ
If youβre learning RAG:
- Start with NumPy β
- Move to FAISS β
That transition is where the real understanding happens.
π Project Repo
π https://github.com/SharathKurup/chatPDF/tree/faiss_indexing
π Whatβs Next?
In Part 3:
π Weβll build a Streamlit UI
π Turn this into a proper app
π¬ Letβs Connect
If you're building something similar or exploring local LLMs, Iβd love to hear your thoughts π



Top comments (0)