Muhammad Usman Awan

Posted on Dec 6, 2025

Throttle Smart, Scale Safe — Complete Guide to Rate Limiting — Architecture Series: Part 6

#webdev #programming #node #architecture

🚦 Rate Limiting / Throttling — Complete Explanation

In modern backend systems, thousands or even millions of requests hit your APIs every day. Not every request is friendly — some might be spam, brute-force attempts, or even DDoS attacks. To ensure your server stays stable, secure, and fair for all users, backend engineers implement Rate Limiting, also known as API Throttling.
This mechanism determines how many requests a client is allowed to make within a specific time window, preventing abuse and ensuring system reliability.

✅ 1. What is Rate Limiting?

Rate limiting (a.k.a. throttling) is a mechanism to control how many requests a client can make to a server within a specific time period.

Example:
“Max 100 requests per minute per user.”

If a client exceeds the limit → you block or delay the request.

❓ Why do we need it?

Rate limiting is mainly used to:

🔒 1. Protect APIs from abuse

Bots sending too many requests
Brute-force login attempts
Spammers trying to overload your API

🛡️ 2. Prevent DDoS attacks

Even if attackers hit your server hard, rate limiting ensures damage is reduced.

⚙️ 3. Fair usage

Multiple users get a fair share of server resources.

💸 4. Reduce server cost

Less unnecessary load → cheaper infra.

🌍 Real-World Examples

✔ GitHub API

Allows 5000 requests/hour per token
Ensures fair usage for all developers.

✔ Cloudflare

Blocks excessive calls from bad IPs automatically.

✔ Instagram / Facebook APIs

Strict rate limits to prevent bots/scrapers.

🧱 2. Types of Rate Limiting

There are many styles of implementing rate limits:

1) Fixed Window Counter (Simple & popular)

Rule: Allow X requests per fixed time window.

Example:
100 requests per minute

Working:

Counter resets every minute.

❌ Weakness:
If the user sends 100 requests at the end of a minute + 100 at start of next minute → burst allowed (200 requests).

2) Sliding Window (Smarter)

Keeps track of the rolling last N seconds.

Example:
Last 60 seconds → only 100 requests allowed.

✔ Prevents burst problem
✔ More accurate and fair

3) Token Bucket (Most used in production)

A bucket is filled at a fixed rate with tokens.
Each request consumes 1 token.

If bucket empty → request denied or delayed.

✔ Allows small bursts
✔ Smooth flow
✔ Used by AWS, Google Cloud, Nginx

4) Leaky Bucket

Requests enter a bucket and leak at a constant rate.

✔ Very stable output
✔ Perfect for smoothing traffic

🗂️ 3. What to rate limit ON?

Common ways:

✔ per IP

Useful for public APIs.

✔ per API key

Best for developer APIs.

✔ per user

For authenticated systems.

✔ per route

For example:

/login → strict limits
/products → relaxed limits

✔ per device

Mobile apps track device ID.

🧪 4. Implementing Rate Limiting in Node.js (Express)

📌 Install the library

npm install express-rate-limit

📌 Basic Middleware

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 1 * 60 * 1000, // 1 minute
  max: 100, // limit each IP to 100 requests per window
  message: "Too many requests, please try again later."
});

app.use(limiter);

Now all routes are limited to 100 req/min per IP.

🎯 Per Route Limit

app.post('/login', rateLimit({
  windowMs: 60 * 1000,
  max: 5, // only 5 login attempts per minute
}), loginController);

🎯 Custom Handler

const limiter = rateLimit({
  windowMs: 60000,
  max: 50,
  handler: (req, res) => {
    res.status(429).json({
      success: false,
      message: "Rate limit exceeded, chill bro!"
    });
  }
});

🧵 5. Distributed Rate Limiting (Redis)

Local memory works only on 1 server.

If you have multiple servers with load balancer → you need shared storage.

Most common solution:

Use Redis as a central counter.

Popular libs:

rate-limiter-flexible
redis-rate-limiter

Benefits:
✔ Works across multiple nodes
✔ Very fast
✔ Production-grade

⚙️ 6. How Big Companies Implement It

🌐 API Gateways

AWS API Gateway
Kong
NGINX
Cloudflare Workers

These use algorithms like:

Token Bucket
Sliding Window
Leaky Bucket

🛑 When limit hits:

Return 429 Too Many Requests

📊 7. Response Codes for Rate Limiting

429 — Too Many Requests

This is the official code for throttling.

Headers returned (optional):

Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 17000000

🧠 8. Best Practices

✔ Use stricter limits on sensitive routes:

/login
/password-reset

✔ Allow small bursts (Token Bucket)
✔ Add Redis for multiple servers
✔ Return proper headers
✔ Use exponential backoff retry
✔ Block abusive IPs automatically
✔ Log and monitor rate-limit hits

🎉 Final Summary

Topic	Explanation
What	Controls number of requests within time window
Why	Protect from abuse, DDoS, ensure fair usage
Types	Fixed window, sliding window, token bucket, leaky bucket
Where	IP, user, API key, route
Node.js	`express-rate-limit` or Redis solutions
Real world	GitHub, Cloudflare, AWS

Rate limiting is one of the most essential backend security techniques and should always be included in production-grade APIs. It protects your infrastructure, ensures fair usage, keeps costs down, and significantly reduces the risk of targeted attacks.

Below is a table of your previous detailed JS / backend topics (for quick revision), so your learning stays connected and structured.

📘 Architecture Series – Index

#	Topic
1	Pagination — Architecture Series: Part 1
2	Indexing — Architecture Series: Part 2
3	Virtualization — Architecture Series: Part 3
4	Caching — Architecture Series: Part 4
5	Sharding - Architecture Series: Part 5

DEV Community