Vivek Kumar

Posted on May 11

Beyond Basic Indexes: Mastering Partial, Composite, and Covering Indexes in SQL

#database #performance #postgres #sql

Most developers know that adding an index speeds up queries. But slapping a basic index on every column you filter by is a recipe for bloated storage, slower writes, and no real performance gain. The real power comes from choosing the right type of index for the job.

In this article, we'll go beyond the basics and explore three advanced indexing strategies: partial indexes, composite indexes, and covering indexes. Each solves a different class of performance problem, and knowing when to reach for each one separates good database work from great database work.

We'll use PostgreSQL throughout, but these concepts apply (with some syntax variation) to MySQL, SQL Server, and other major databases.

The Setup: An E-Commerce Orders Table

Let's ground everything in a realistic scenario. Imagine an orders table with tens of millions of rows:

CREATE TABLE orders (
  id          BIGSERIAL PRIMARY KEY,
  customer_id BIGINT        NOT NULL,
  status      VARCHAR(20)   NOT NULL,  -- 'pending', 'shipped', 'delivered', 'cancelled'
  total       NUMERIC(10,2) NOT NULL,
  created_at  TIMESTAMPTZ   NOT NULL DEFAULT now(),
  region      VARCHAR(50)   NOT NULL
);

Over time, the table grows to 50 million rows. Most orders are 'delivered' or 'cancelled' — a tiny fraction (maybe 2%) are 'pending' or 'shipped'. This distribution is the key to understanding why generic indexes often fall short.

Partial Indexes: Index Only What You Query

A partial index is built over a subset of rows, defined by a WHERE condition. It's one of the most underused features in PostgreSQL.

The Problem

Your support dashboard queries pending and shipped orders constantly:

SELECT id, customer_id, total
FROM orders
WHERE status IN ('pending', 'shipped')
ORDER BY created_at DESC;

A plain index on status would index all 50 million rows, even though 98% of them will never match this query. That's 50 million index entries serving 1 million actual lookups.

The Solution: A Partial Index

CREATE INDEX idx_orders_active
  ON orders (created_at DESC)
  WHERE status IN ('pending', 'shipped');

Now PostgreSQL only indexes the ~1 million "active" rows. The index is 98% smaller, fits more easily in memory, and stays faster even as the full table grows — because delivered/cancelled orders never bloat it.

Query Plan Comparison

Before (sequential scan on 50M rows):

Seq Scan on orders  (cost=0.00..1245000.00 rows=1100000 width=32)
  Filter: (status = ANY ('{pending,shipped}'::text[]))

After (index scan on 1M-entry partial index):

Index Scan using idx_orders_active on orders  (cost=0.43..9834.12 rows=1100000 width=32)

When to Use Partial Indexes

Partial indexes shine when:

A column has highly skewed distribution (most queries target a small minority of rows)
You filter on "active" or "open" records and the closed ones pile up over time
You want a unique constraint only for non-null values: CREATE UNIQUE INDEX ON users (email) WHERE email IS NOT NULL;

Composite Indexes: Column Order Is Everything

A composite index (also called a multi-column index) covers more than one column. Done right, it eliminates multiple sequential scans. Done wrong, it wastes space and never gets used.

The Left-to-Right Rule

PostgreSQL can use a composite index only when the query filters on a prefix of the indexed columns. Given:

CREATE INDEX idx_orders_customer_status_date
  ON orders (customer_id, status, created_at DESC);

Query predicate	Uses index?	Why
`WHERE customer_id = 42`	✅ Yes	Matches leading column
`WHERE customer_id = 42 AND status = 'pending'`	✅ Yes	Matches first two columns
`WHERE customer_id = 42 AND status = 'pending' AND created_at > '2026-01-01'`	✅ Yes (full use)	Matches all three columns
`WHERE status = 'pending'`	❌ No	Skips leading column
`WHERE created_at > '2026-01-01'`	❌ No	Skips first two columns

Column Ordering Strategy

Follow this rule of thumb for column order in a composite index:

Equality columns first — columns used with = or IN
Range columns next — columns used with >, <, BETWEEN
Sort columns last — columns used in ORDER BY

So for this query:

SELECT id, total
FROM orders
WHERE customer_id = 42
  AND status = 'pending'
  AND created_at > now() - interval '30 days'
ORDER BY created_at DESC;

The ideal index is exactly what we created: (customer_id, status, created_at DESC). PostgreSQL can narrow by customer_id (equality), then narrow further by status (equality), then scan only the recent range of created_at — all within the index structure.

Avoid Index Proliferation

One well-designed composite index often replaces two or three single-column indexes. If you have idx_customer_id and idx_status separately, queries needing both columns force a "bitmap AND" merge — two index lookups joined in memory. A single composite index eliminates that overhead.

Covering Indexes: Make the Index the Answer

A covering index is one that contains every column a query needs — so PostgreSQL never has to touch the actual table rows. This is called an index-only scan, and it's the fastest possible read path.

The Problem: Heap Fetches

Normally, after an index lookup, PostgreSQL must fetch the actual row from the table heap to retrieve non-indexed columns. On large tables with many concurrent writes, those heap fetches cause random I/O that kills performance.

The INCLUDE Clause

PostgreSQL 11+ introduced the INCLUDE clause to add "payload" columns to an index without making them part of the sort key:

CREATE INDEX idx_orders_customer_covering
  ON orders (customer_id, status)
  INCLUDE (total, created_at);

Now this query runs entirely off the index — no heap access at all:

SELECT total, created_at
FROM orders
WHERE customer_id = 42
  AND status = 'shipped';

INCLUDE vs Adding to the Key

Why not just add total and created_at to the index key itself? Because:

Non-key columns aren't sorted, so they can't help filter or sort — they just live as payload in the leaf pages
Smaller B-tree height — fewer levels to traverse since non-key columns only appear in leaf nodes, not interior nodes
Avoids over-indexing — the key remains selective; INCLUDE columns just come along for the ride

Seeing It in Action

EXPLAIN (ANALYZE, BUFFERS)
SELECT total, created_at
FROM orders
WHERE customer_id = 42
  AND status = 'shipped';

Without INCLUDE:

Index Scan using idx_orders_customer_status on orders
  (cost=0.56..892.34 rows=847 width=24)
  (actual time=0.031..4.821 rows=847 loops=1)
  Buffers: shared hit=423 read=634   ← 634 heap blocks read

With INCLUDE (total, created_at):

Index Only Scan using idx_orders_customer_covering on orders
  (cost=0.56..112.18 rows=847 width=24)
  (actual time=0.025..0.893 rows=847 loops=1)
  Buffers: shared hit=52 read=0      ← 0 heap blocks read

That's a 5× reduction in execution time and zero heap I/O.

Combining All Three: The Power Move

You can combine all three strategies. Here's a partial covering composite index:

CREATE INDEX idx_orders_active_by_region
  ON orders (region, created_at DESC)
  INCLUDE (customer_id, total)
  WHERE status IN ('pending', 'shipped');

This index:

Only indexes active orders (partial — keeps it small)
Supports filtering by region and sorting by date (composite — ordered key)
Returns customer_id and total without a heap fetch (covering — zero table I/O)

A query like this will hit this index perfectly:

SELECT customer_id, total, created_at
FROM orders
WHERE region = 'EU-West'
  AND status IN ('pending', 'shipped')
ORDER BY created_at DESC
LIMIT 50;

Common Gotchas

Over-indexing writes more than you read. Every index is updated on every INSERT, UPDATE, and DELETE. An extra 5 indexes on a high-write table can make inserts 2–3× slower. Audit with pg_stat_user_indexes to find unused indexes:

SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

Partial index predicates must match exactly. PostgreSQL uses a partial index only when the query's WHERE clause implies (or matches) the index predicate. WHERE status = 'pending' won't use an index defined as WHERE status IN ('pending', 'shipped') — you need WHERE status IN ('pending', 'shipped') or the planner must be able to prove the subset relationship.

INCLUDE columns don't help WHERE or ORDER BY. They are non-key payload only — PostgreSQL can't use them to narrow or sort. If you filter or sort by a column, it must be in the key part of the index, not in INCLUDE.

Build indexes concurrently on live tables. Creating an index on a large table locks out writes. Use CREATE INDEX CONCURRENTLY to build the index without blocking:

CREATE INDEX CONCURRENTLY idx_orders_active_by_region
  ON orders (region, created_at DESC)
  INCLUDE (customer_id, total)
  WHERE status IN ('pending', 'shipped');

Key Takeaways

Partial indexes target a narrow, frequently queried subset of rows. They stay small and fast even as the full table balloons in size.
Composite indexes cover multi-column queries efficiently — but column order matters. Equality predicates first, range predicates second, sort columns last.
Covering indexes (with INCLUDE) allow index-only scans, eliminating costly random I/O to the table heap.
Combining all three gives you a precision instrument: small, ordered, and self-contained.

The best index strategy always starts with understanding your query patterns. Profile with EXPLAIN (ANALYZE, BUFFERS), identify where heap reads or sequential scans dominate, then choose the index type that surgically removes that bottleneck.

Have you tuned a slow query with one of these techniques? Drop a comment below — especially if you've hit a surprising edge case. And if you're new to indexing fundamentals, check out the earlier post in this series on basic SQL indexes before diving deeper.

DEV Community