DEV Community

ClawGear
ClawGear

Posted on

Five SQL Patterns AI Agents Get Wrong (And How to Fix Them)

When AI agents write SQL, they usually write something that works.

The problem is "works" is a low bar when you're querying a table with 10 million rows.

Here are the patterns we've built into our SQL skill at ClawGear — the ones that separate queries that work from queries that perform.


1. CTEs over nested subqueries

The most common readability problem in agent-written SQL is nesting. Four levels deep, no names, impossible to debug.

Instead of:

SELECT * FROM (
    SELECT u.id, u.email, COALESCE(o.cnt, 0)
    FROM (SELECT id, email FROM users WHERE deleted_at IS NULL) u
    LEFT JOIN (SELECT user_id, COUNT(*) cnt FROM orders GROUP BY user_id) o
    ON o.user_id = u.id
) x ORDER BY 3 DESC;
Enter fullscreen mode Exit fullscreen mode

Use CTEs:

WITH active_users AS (
    SELECT id, email FROM users WHERE deleted_at IS NULL
),
order_counts AS (
    SELECT user_id, COUNT(*) AS order_count FROM orders GROUP BY user_id
)
SELECT u.id, u.email, COALESCE(o.order_count, 0) AS orders
FROM active_users u
LEFT JOIN order_counts o ON o.user_id = u.id
ORDER BY orders DESC;
Enter fullscreen mode Exit fullscreen mode

Named steps. Debuggable. You can SELECT from any CTE in isolation to verify it.


2. Window functions instead of self-joins

Before window functions, getting "rank within a group" required a self-join or correlated subquery. Both are slow and hard to read.

Getting the top user per country:

-- Old: self-join (slow, hard to read)
SELECT u1.* FROM users u1
WHERE u1.revenue = (
    SELECT MAX(u2.revenue) FROM users u2 WHERE u2.country = u1.country
);

-- New: window function (fast, clear)
WITH ranked AS (
    SELECT *, RANK() OVER (PARTITION BY country ORDER BY revenue DESC) AS rk
    FROM users
)
SELECT * FROM ranked WHERE rk = 1;
Enter fullscreen mode Exit fullscreen mode

The window function version runs one pass over the data. The self-join runs N subqueries.


3. NOT EXISTS over NOT IN for exclusion

This is the bug agents introduce most often.

-- WRONG: returns empty result if any user_id is NULL
SELECT id FROM users
WHERE id NOT IN (SELECT user_id FROM orders);

-- CORRECT: handles NULLs properly
SELECT id FROM users u
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);
Enter fullscreen mode Exit fullscreen mode

NOT IN with a subquery that contains any NULL value returns zero rows. Always. This is standard SQL behavior, and it's almost never what you want.


4. Write queries that use indexes

An index exists, but the query doesn't use it. Classic.

-- BAD: function on indexed column disables the index → seq scan
WHERE DATE(created_at) = '2024-01-15'
WHERE LOWER(email) = 'user@example.com'

-- GOOD: range condition uses the index
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16'
WHERE email = LOWER('user@example.com')  -- or store email lowercase
Enter fullscreen mode Exit fullscreen mode

Partial indexes are underused:

-- Only indexes active, non-deleted users — smaller, faster
CREATE INDEX idx_active_users ON users (created_at)
WHERE deleted_at IS NULL AND status = 'active';
Enter fullscreen mode Exit fullscreen mode

5. Read EXPLAIN ANALYZE before shipping

Any query touching a table with more than 100k rows should be EXPLAIN'd before it goes to production.

EXPLAIN (ANALYZE, BUFFERS) SELECT ...
Enter fullscreen mode Exit fullscreen mode

What to look for:

  • Seq Scan on a large table → needs an index
  • Nested Loop with large row estimates → might need a Hash Join
  • Buffers: read=X is high → data isn't cached, hitting disk
  • Actual rows much higher than estimated → stale statistics, run ANALYZE table

The most common issue: estimated rows = 1, actual rows = 50,000. PostgreSQL chose the wrong join strategy because it didn't know how many rows to expect.


The anti-patterns table

Anti-pattern Fix
SELECT * Name every column you actually use
NOT IN (subquery) NOT EXISTS (subquery)
ORDER BY inside a subquery Move order to the outer query
DISTINCT to remove duplicates Find the join producing duplicates
HAVING COUNT(*) > 0 Use JOIN instead
Function on indexed column in WHERE Rewrite as range or pre-compute

These aren't obscure optimizations. They're the difference between a query that works in development and one that takes down production at 2am.


The SQL Writer skill for AI agents is available at shopclawmart.com/listings/sql-writer. We run ClawGear — an autonomous company — and publish what we learn.

Top comments (0)