Sreeraj Sreenivasan

Posted on May 11

The Resilience & Observability Stack

#fastapi #observability #python #webdev

Building Production-Ready FastAPI in 2026

Your API works. But is it production-ready?

Why This Matters

In 2026, "Contract-First" development means more than an OpenAPI spec. It means three implicit promises to every consumer:

Errors are predictable — every failure returns a structured, documented payload
Health is visible — logs, metrics, and traces tell a coherent story
The system self-heals — transient failures retry; abuse gets throttled

This article covers the two pillars that deliver on those promises:

Observability: Structlog + Prometheus + Sentry + Rich
Resilience: Tenacity + SlowAPI

Pillar 1: Enterprise Observability

Structlog — Centralized JSON Logging

Cloud environments need machine-readable logs. Structlog gives you JSON in production and human-friendly output locally — toggled by a single env var.

# app/core/logging.py
import structlog

def configure_logging() -> None:
    processors = [
        structlog.contextvars.merge_contextvars,
        structlog.stdlib.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()  # swap for ConsoleRenderer() locally
        if settings.ENVIRONMENT == "production"
        else structlog.dev.ConsoleRenderer(colors=True),
    ]
    structlog.configure(processors=processors, cache_logger_on_first_use=True)

In production, every log entry is a clean, indexable JSON object. In development, it's colourised and human-readable — no config changes required.

Rich — Beautiful Local Console Output

Install Rich tracebacks globally and your terminal shows full variable state at every frame of an exception — invaluable for debugging async SQLAlchemy sessions.

from rich.traceback import install as install_rich_traceback
install_rich_traceback(show_locals=True, width=120)

Instead of a wall of text, you get colour-coded output with file references and local variable values at the exact line that failed.

Prometheus — Metrics in Two Lines

from prometheus_fastapi_instrumentator import Instrumentator

Instrumentator().instrument(app).expose(app, endpoint="/metrics")

You get request counts, latency histograms, and in-flight connections out of the box — ready for Grafana dashboards and SLO alerting.

Sentry — Error Tracking That Talks to Your Frontend

The critical constraint most templates miss: Sentry by default drops HTTPException.detail — the exact string your React frontend reads to show users a meaningful message like "User already exists".

Fix it with a before_send hook:

from fastapi import HTTPException

def before_send(event: dict, hint: dict) -> dict | None:
    exc_info = hint.get("exc_info")
    if exc_info:
        _, exc_value, _ = exc_info
        if isinstance(exc_value, HTTPException):
            event.setdefault("extra", {})
            event["extra"]["http_exception_detail"] = exc_value.detail
            event["extra"]["status_code"] = exc_value.status_code
            event.setdefault("tags", {})["http_status"] = str(exc_value.status_code)
    return event

sentry_sdk.init(dsn=settings.SENTRY_DSN, before_send=before_send, ...)

Now every 409 Conflict in your Sentry dashboard shows exactly what the user saw. Filter by http_status:409 across your entire project instantly.

Pillar 2: Application Resilience

Tenacity — Retries for Transient Failures

Kubernetes rolling deploys, Aurora cold starts, and flaky network hops all introduce brief connectivity gaps. Without retries, those gaps become 500 errors. With Tenacity, they're invisible.

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from sqlalchemy.exc import OperationalError, DisconnectionError

db_retry = retry(
    retry=retry_if_exception_type((OperationalError, DisconnectionError)),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=0.5, max=4),
    reraise=True,  # Still raises after exhaustion — Sentry catches it with full context
)

Apply it as a decorator on any database or external HTTP call:

@db_retry
async def get_by_email(self, db: AsyncSession, email: str) -> User | None:
    result = await db.execute(select(User).where(User.email == email))
    return result.scalar_one_or_none()

reraise=True ensures exhausted retries still propagate normally through your exception handlers, keeping Structlog and Sentry integration intact.

SlowAPI — Rate Limiting That Respects Your OpenAPI Contract

The subtle problem with naive rate limiting: the 429 response becomes an undocumented payload that breaks your auto-generated frontend client.

The fix is a custom handler that returns {"detail": "..."} — identical to every other FastAPI error:

from fastapi.responses import JSONResponse
from slowapi.errors import RateLimitExceeded

async def rate_limit_handler(request: Request, exc: RateLimitExceeded) -> JSONResponse:
    return JSONResponse(
        status_code=429,
        content={"detail": f"Rate limit exceeded: {exc.detail}. Please slow down."},
        headers={"Retry-After": "60"},
    )

app.add_exception_handler(RateLimitExceeded, rate_limit_handler)

Apply per-route limits based on risk:

@router.post("/auth/login")
@limiter.limit("10/minute")   # Strict — brute-force bait
async def login(request: Request, payload: LoginRequest): ...

@router.post("/users/")
@limiter.limit("5/minute")    # Tight — prevent account creation spam
async def create_user(request: Request, payload: UserCreate): ...

The Frontend Connection

Every tool above converges on one payoff: your React client always reads the same key.

Scenario	Status	Response
User already exists	`409`	`{"detail": "User already exists"}`
Rate limit hit	`429`	`{"detail": "Rate limit exceeded..."}`
Validation failure	`422`	`{"detail": [...Pydantic errors]}`
Server error	`500`	`{"detail": "Internal server error"}`

One Axios interceptor handles all of them:

client.interceptors.response.use(
  (res) => res,
  (error) => {
    const message = error.response?.data?.detail ?? "An unexpected error occurred.";
    toast.error(typeof message === "string" ? message : JSON.stringify(message));
    return Promise.reject(error);
  }
);

No special-casing. No silent failures. One contract, end to end.

Conclusion

The gap between a demo API and a production API isn't features — it's operational maturity.

Tool	What it solves
`Structlog`	Structured logs for cloud aggregators
`Rich`	Developer-friendly local debugging
`Prometheus`	Latency metrics and SLO visibility
`Sentry + before_send`	Error tracking with frontend-aware payloads
`Tenacity`	Silent recovery from transient failures
`SlowAPI + custom handler`	Rate limiting that honours your OpenAPI contract

Together, these six tools ensure your FastAPI app behaves with consistency and transparency — whether on a single VPS, a Kubernetes cluster, or a globally distributed edge network.

Ship with confidence.

Tags: fastapi python observability sentry prometheus structlog tenacity slowapi backend

DEV Community