How Moving Rate Limiting to Redis+Go 8x'd Our API Gateway Throughput (And Cost Us 3 Days of Debugging)

#programming #python

It was 2 AM. I was jolted awake by a cascade of alerts — our downstream order database was thrashing at 90% CPU, connection pools exhausted, the whole service collapsing. Scrambling through the monitoring dashboards, I found the smoking gun: a rolling deployment of the gateway had just finished. On each new pod, the local token bucket counters started from scratch. For less than 10 seconds, the rate limiter suffered a collective “amnesia.” That tiny window of uncontrolled traffic pierced through every layer of protection and brought the system to its knees.

Right then, I knew: local, in-process rate limiting was done. We needed distributed rate limiting — and fast.

Why In-Memory Rate Limiting Is a Lie in Distributed Systems

Inside a multi-instance API gateway, rate limiting is supposed to protect downstream services. If you use Go’s rate.Limiter or Guava’s RateLimiter, each instance maintains its own token bucket. Under perfectly spread traffic, limits seem to hold. But two scenarios instantly strip away that protection:

Rolling deployments or restarts: A fresh instance starts with a full bucket; old counters are never inherited. You lose limiting for that entire bootstrap window.
Traffic skew: If a user is consistently hashed to the same instance (think sticky sessions), the limiter only knows about that instance’s local view. When that one instance is overwhelmed, the rest of the fleet remains oblivious — and downstream still melts.

The root cause is simple: “global traffic demands global counting.” The industry go-to is Redis for distributed counters, but too many implementations just use INCR with a TTL — a classic fixed-window approach. Fixed windows have a notorious flaw: request bursts at the boundary. Two consecutive windows can each allow their full quota within a 200ms span, effectively doubling the allowed rate.

I wanted something smoother: a sliding window algorithm, backed by Redis sorted sets (ZSET).

The Design: Redis + Lua + ZSET

I evaluated three options:

Nginx/OpenResty rate-limiting modules: blazing fast, but configuration is static. Dynamically adjusting rules from business logic would have been a nightmare.
Sentinel/Hystrix: focus more on circuit breaking and degradation. The rate limiting is again local; going distributed requires deploying an external console — too heavy.
Build our own Redis-based sliding window limiter: Use ZSET scores to store request timestamps, with each key representing a rate-limiting dimension (user ID, API path, etc.), down to millisecond precision. A Lua script bundles the “check + add + evict” logic into an atomic operation — one network round trip does it all.

Redis was the clear winner: nearly every backend already has a Redis cluster, so zero extra deployment cost. Lua scripting guarantees atomicity under concurrency. And ZSETs are naturally suited for range queries and removals — sliding windows feel almost native.

On the architecture side, it’s a thin Go middleware. Every request hits the Redis Lua script to get an accept/reject decision. To relieve Redis pressure, we later added an in-memory pre-check (more on that another time).

The Core: From Atomic Lua to Go

What this solves: atomic “check–count–evict” for a sliding window inside Redis

-- sliding_window.lua
-- KEYS[1]  限流 key, 如 "rate:api:/order:user_123"
-- ARGV[1]  窗口长度, 单位毫秒, 如 1000
-- ARGV[2]  最大请求数
-- ARGV[3]  当前时间戳, 由 Redis 服务器生成 TIME 的毫秒表示
-- ARGV[4]  成员唯一标识, 一般用纳秒级时间戳+随机数, 防止 score 相同被覆盖

local key       = KEYS[1]
local window_ms = tonumber(ARGV[1])
local limit     = tonumber(ARGV[2])
local now       = tonumber(ARGV[3])
local member    = ARGV[4]

-- 移除窗口外的旧数据
redis.call("ZREMRANGEBYSCORE", key, 0, now - window_ms)

-- 获取当前窗口内的请求数
local count = redis.call("ZCARD", key)

if count < limit then
    -- 允许通过，添加当前请求时间戳
    redis.call("ZADD", key, now, member)
    -- 给 key 设置过期时间，防止无人访问时 key 永久存在
    redis.call("PEXPIRE", key, window_ms + 1000)
    return 1
else
    return 0
end

Critical detail: member must be globally unique. Otherwise identical scores would overwrite each other and distort the count. I generate it on the Go side as current microsecond timestamp + random number. This way even concurrent requests arriving in the same millisecond never collide. I also set PEXPIRE to window_ms + 1000 — slightly longer than the window — to avoid garbage keys sticking around forever, while preventing premature expiration that could drop valid data at the boundary.

What this solves: Go wrapper that connects to Redis, loads the script, and exposes an `Allow` interface

package ratelimit

import (
    "context"
    "crypto/rand"
    "fmt"
    "math/big"
    "time"

    "github.com/redis/go-redis/v9"
)

type SlidingWindowLimiter struct {
    client *redis.Client
    script *redis.Script   // 缓存 Lua 脚本 SHA
    window time.Duration
    limit  int
}

func NewLimiter(client *redis.Client, window time.Duration, limit int) *SlidingWindowLimiter {
    src := `
        local key       = KEYS[1]
        local window_ms = tonumber(ARGV[1])
        local limit     = tonumber(ARGV[2])
        local now       = tonumber(ARGV[3])
        local member    = ARGV[4]

        redis.call("ZREMRANGEBYSCORE", key, 0, now - window_ms)
        local count = redis.call("ZCARD", key)
        if count < limit then
            redis.call("ZADD", key, now, member)
            redis.call("PEXPIRE", key, window_ms + 1000)
            return 1
        else
            return 0
        end
    `
    return &SlidingWindowLimiter{
        client: client,
        script: redis.NewSc

DEV Community

How Moving Rate Limiting to Redis+Go 8x'd Our API Gateway Throughput (And Cost Us 3 Days of Debugging)

Why In-Memory Rate Limiting Is a Lie in Distributed Systems

The Design: Redis + Lua + ZSET

The Core: From Atomic Lua to Go

What this solves: atomic “check–count–evict” for a sliding window inside Redis

What this solves: Go wrapper that connects to Redis, loads the script, and exposes an `Allow` interface

Top comments (0)

Why In-Memory Rate Limiting Is a Lie in Distributed Systems

The Design: Redis + Lua + ZSET

The Core: From Atomic Lua to Go

What this solves: atomic “check–count–evict” for a sliding window inside Redis

What this solves: Go wrapper that connects to Redis, loads the script, and exposes an Allow interface

What this solves: Go wrapper that connects to Redis, loads the script, and exposes an `Allow` interface