DEV Community

Cover image for Throttle Smart, Scale Safe — Complete Guide to Rate Limiting — Architecture Series: Part 6
Muhammad Usman Awan
Muhammad Usman Awan

Posted on

Throttle Smart, Scale Safe — Complete Guide to Rate Limiting — Architecture Series: Part 6

🚦 Rate Limiting / Throttling — Complete Explanation

In modern backend systems, thousands or even millions of requests hit your APIs every day. Not every request is friendly — some might be spam, brute-force attempts, or even DDoS attacks. To ensure your server stays stable, secure, and fair for all users, backend engineers implement Rate Limiting, also known as API Throttling.
This mechanism determines how many requests a client is allowed to make within a specific time window, preventing abuse and ensuring system reliability.


✅ 1. What is Rate Limiting?

Rate limiting (a.k.a. throttling) is a mechanism to control how many requests a client can make to a server within a specific time period.

Example:
“Max 100 requests per minute per user.”

If a client exceeds the limit → you block or delay the request.


❓ Why do we need it?

Rate limiting is mainly used to:

🔒 1. Protect APIs from abuse

  • Bots sending too many requests
  • Brute-force login attempts
  • Spammers trying to overload your API

🛡️ 2. Prevent DDoS attacks

Even if attackers hit your server hard, rate limiting ensures damage is reduced.

⚙️ 3. Fair usage

Multiple users get a fair share of server resources.

💸 4. Reduce server cost

Less unnecessary load → cheaper infra.


🌍 Real-World Examples

✔ GitHub API

  • Allows 5000 requests/hour per token
  • Ensures fair usage for all developers.

✔ Cloudflare

  • Blocks excessive calls from bad IPs automatically.

✔ Instagram / Facebook APIs

  • Strict rate limits to prevent bots/scrapers.

🧱 2. Types of Rate Limiting

There are many styles of implementing rate limits:

1) Fixed Window Counter (Simple & popular)

Rule: Allow X requests per fixed time window.

Example:
100 requests per minute

Working:

  • Counter resets every minute.

❌ Weakness:
If the user sends 100 requests at the end of a minute + 100 at start of next minute → burst allowed (200 requests).

2) Sliding Window (Smarter)

Keeps track of the rolling last N seconds.

Example:
Last 60 seconds → only 100 requests allowed.

✔ Prevents burst problem
✔ More accurate and fair

3) Token Bucket (Most used in production)

A bucket is filled at a fixed rate with tokens.
Each request consumes 1 token.

If bucket empty → request denied or delayed.

✔ Allows small bursts
✔ Smooth flow
✔ Used by AWS, Google Cloud, Nginx

4) Leaky Bucket

Requests enter a bucket and leak at a constant rate.

✔ Very stable output
✔ Perfect for smoothing traffic

🗂️ 3. What to rate limit ON?

Common ways:

✔ per IP

Useful for public APIs.

✔ per API key

Best for developer APIs.

✔ per user

For authenticated systems.

✔ per route

For example:

  • /login → strict limits
  • /products → relaxed limits

✔ per device

Mobile apps track device ID.

🧪 4. Implementing Rate Limiting in Node.js (Express)

📌 Install the library

npm install express-rate-limit
Enter fullscreen mode Exit fullscreen mode

📌 Basic Middleware

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 1 * 60 * 1000, // 1 minute
  max: 100, // limit each IP to 100 requests per window
  message: "Too many requests, please try again later."
});

app.use(limiter);
Enter fullscreen mode Exit fullscreen mode

Now all routes are limited to 100 req/min per IP.


🎯 Per Route Limit

app.post('/login', rateLimit({
  windowMs: 60 * 1000,
  max: 5, // only 5 login attempts per minute
}), loginController);
Enter fullscreen mode Exit fullscreen mode

🎯 Custom Handler

const limiter = rateLimit({
  windowMs: 60000,
  max: 50,
  handler: (req, res) => {
    res.status(429).json({
      success: false,
      message: "Rate limit exceeded, chill bro!"
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

🧵 5. Distributed Rate Limiting (Redis)

Local memory works only on 1 server.

If you have multiple servers with load balancer → you need shared storage.

Most common solution:

Use Redis as a central counter.

Popular libs:

  • rate-limiter-flexible
  • redis-rate-limiter

Benefits:
✔ Works across multiple nodes
✔ Very fast
✔ Production-grade


⚙️ 6. How Big Companies Implement It

🌐 API Gateways

  • AWS API Gateway
  • Kong
  • NGINX
  • Cloudflare Workers

These use algorithms like:

  • Token Bucket
  • Sliding Window
  • Leaky Bucket

🛑 When limit hits:

  • Return 429 Too Many Requests

📊 7. Response Codes for Rate Limiting

429 — Too Many Requests

This is the official code for throttling.

Headers returned (optional):

  • Retry-After: 30
  • X-RateLimit-Limit: 100
  • X-RateLimit-Remaining: 0
  • X-RateLimit-Reset: 17000000

🧠 8. Best Practices

✔ Use stricter limits on sensitive routes:

  • /login
  • /password-reset

✔ Allow small bursts (Token Bucket)
✔ Add Redis for multiple servers
✔ Return proper headers
✔ Use exponential backoff retry
✔ Block abusive IPs automatically
✔ Log and monitor rate-limit hits


🎉 Final Summary

Topic Explanation
What Controls number of requests within time window
Why Protect from abuse, DDoS, ensure fair usage
Types Fixed window, sliding window, token bucket, leaky bucket
Where IP, user, API key, route
Node.js express-rate-limit or Redis solutions
Real world GitHub, Cloudflare, AWS

Rate limiting is one of the most essential backend security techniques and should always be included in production-grade APIs. It protects your infrastructure, ensures fair usage, keeps costs down, and significantly reduces the risk of targeted attacks.

Below is a table of your previous detailed JS / backend topics (for quick revision), so your learning stays connected and structured.

📘 Architecture Series – Index

Top comments (0)