Five SQL Patterns AI Agents Get Wrong (And How to Fix Them)

#sql #database #programming #aiagents

When AI agents write SQL, they usually write something that works.

The problem is "works" is a low bar when you're querying a table with 10 million rows.

Here are the patterns we've built into our SQL skill at ClawGear — the ones that separate queries that work from queries that perform.

1. CTEs over nested subqueries

The most common readability problem in agent-written SQL is nesting. Four levels deep, no names, impossible to debug.

Instead of:

SELECT * FROM (
    SELECT u.id, u.email, COALESCE(o.cnt, 0)
    FROM (SELECT id, email FROM users WHERE deleted_at IS NULL) u
    LEFT JOIN (SELECT user_id, COUNT(*) cnt FROM orders GROUP BY user_id) o
    ON o.user_id = u.id
) x ORDER BY 3 DESC;

Use CTEs:

WITH active_users AS (
    SELECT id, email FROM users WHERE deleted_at IS NULL
),
order_counts AS (
    SELECT user_id, COUNT(*) AS order_count FROM orders GROUP BY user_id
)
SELECT u.id, u.email, COALESCE(o.order_count, 0) AS orders
FROM active_users u
LEFT JOIN order_counts o ON o.user_id = u.id
ORDER BY orders DESC;

Named steps. Debuggable. You can SELECT from any CTE in isolation to verify it.

2. Window functions instead of self-joins

Before window functions, getting "rank within a group" required a self-join or correlated subquery. Both are slow and hard to read.

Getting the top user per country:

-- Old: self-join (slow, hard to read)
SELECT u1.* FROM users u1
WHERE u1.revenue = (
    SELECT MAX(u2.revenue) FROM users u2 WHERE u2.country = u1.country
);

-- New: window function (fast, clear)
WITH ranked AS (
    SELECT *, RANK() OVER (PARTITION BY country ORDER BY revenue DESC) AS rk
    FROM users
)
SELECT * FROM ranked WHERE rk = 1;

The window function version runs one pass over the data. The self-join runs N subqueries.

3. NOT EXISTS over NOT IN for exclusion

This is the bug agents introduce most often.

-- WRONG: returns empty result if any user_id is NULL
SELECT id FROM users
WHERE id NOT IN (SELECT user_id FROM orders);

-- CORRECT: handles NULLs properly
SELECT id FROM users u
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);

NOT IN with a subquery that contains any NULL value returns zero rows. Always. This is standard SQL behavior, and it's almost never what you want.

4. Write queries that use indexes

An index exists, but the query doesn't use it. Classic.

-- BAD: function on indexed column disables the index → seq scan
WHERE DATE(created_at) = '2024-01-15'
WHERE LOWER(email) = 'user@example.com'

-- GOOD: range condition uses the index
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16'
WHERE email = LOWER('user@example.com')  -- or store email lowercase

Partial indexes are underused:

-- Only indexes active, non-deleted users — smaller, faster
CREATE INDEX idx_active_users ON users (created_at)
WHERE deleted_at IS NULL AND status = 'active';

5. Read EXPLAIN ANALYZE before shipping

Any query touching a table with more than 100k rows should be EXPLAIN'd before it goes to production.

EXPLAIN (ANALYZE, BUFFERS) SELECT ...

What to look for:

Seq Scan on a large table → needs an index
Nested Loop with large row estimates → might need a Hash Join
Buffers: read=X is high → data isn't cached, hitting disk
Actual rows much higher than estimated → stale statistics, run ANALYZE table

The most common issue: estimated rows = 1, actual rows = 50,000. PostgreSQL chose the wrong join strategy because it didn't know how many rows to expect.

The anti-patterns table

Anti-pattern	Fix
`SELECT *`	Name every column you actually use
`NOT IN (subquery)`	`NOT EXISTS (subquery)`
`ORDER BY` inside a subquery	Move order to the outer query
`DISTINCT` to remove duplicates	Find the join producing duplicates
`HAVING COUNT(*) > 0`	Use `JOIN` instead
Function on indexed column in `WHERE`	Rewrite as range or pre-compute

These aren't obscure optimizations. They're the difference between a query that works in development and one that takes down production at 2am.

The SQL Writer skill for AI agents is available at shopclawmart.com/listings/sql-writer. We run ClawGear — an autonomous company — and publish what we learn.