DEV Community

Lich Priest
Lich Priest

Posted on

Deploy a Real‑Time Object Detection API with YOLOv8 & FastAPI

Why combine YOLOv8 and FastAPI?

Object detection is at the heart of many modern applications—think smart cameras, inventory robots, or AR experiences. YOLOv8 (You Only Look Once) gives you state‑of‑the‑art accuracy while still running fast enough for real‑time use. FastAPI, on the other hand, is a lightweight, async‑first web framework that makes it trivial to expose a model as a REST endpoint.

In this tutorial you’ll walk through:

  1. Preparing a small custom dataset and training a YOLOv8 model.
  2. Wrapping the model in a FastAPI service that accepts images and returns detections.
  3. Docker‑izing the whole stack so it can run anywhere with a single docker compose up.

By the end you’ll have a reproducible, container‑based API that can serve predictions in a few milliseconds.

Prerequisites

Tool Version Why
Python 3.9‑3.11 Compatibility with Ultralytics YOLO
Ultralytics YOLO pip install ultralytics Training and inference
FastAPI pip install fastapi[all] HTTP server
Docker & Docker‑Compose Latest Container orchestration
Git Any Version control (optional)

You’ll also need a modest GPU for training (even a laptop GPU works for a small dataset). If you only want to test inference, CPU‑only mode is fine.

1. Prepare a custom dataset

YOLOv8 expects the classic folder layout:

my_dataset/
├── images/
│   ├── train/
│   └── val/
└── labels/
    ├── train/
    └── val/
Enter fullscreen mode Exit fullscreen mode

Each image in train/ or val/ has a corresponding .txt file in the same sub‑folder under labels/. The label file contains one line per object:

<class_id> <x_center> <y_center> <width> <height>
Enter fullscreen mode Exit fullscreen mode

All coordinates are normalized (0‑1). If you already have COCO‑style annotations, the ultralytics package can convert them:

# convert_coco_to_yolo.py
from ultralytics import YOLO

YOLO.convert_coco(
    data="coco_annotations.json",
    save_dir="my_dataset"
)
Enter fullscreen mode Exit fullscreen mode

Once you have the folder ready, create a data.yaml that points to it:

# data.yaml
train: ./my_dataset/images/train
val: ./my_dataset/images/val

nc: 3                     # number of classes
names: ['person', 'bicycle', 'dog']
Enter fullscreen mode Exit fullscreen mode

2. Train the model

Training with YOLOv8 is a single line:

yolo task=detect mode=train data=data.yaml epochs=50 imgsz=640 batch=16 model=yolov8n.pt
Enter fullscreen mode Exit fullscreen mode
  • yolov8n.pt is the nano version (fastest, smallest). Swap for yolov8s.pt or larger if you need higher accuracy.
  • Adjust epochs, batch, and imgsz to fit your hardware.

After training finishes you’ll find the best checkpoint in runs/detect/train/weights/best.pt. Keep that file; it’s what the API will load.

3. Build the FastAPI inference service

Create a new folder api/ and add the following files.

app/main.py

# app/main.py
import io
import numpy as np
from fastapi import FastAPI, File, UploadFile, HTTPException
from ultralytics import YOLO
from PIL import Image

app = FastAPI(title="YOLOv8 Object Detection API")

# Load the model once at startup
model = YOLO("weights/best.pt")

def pil_to_numpy(img: Image.Image) -> np.ndarray:
    """Convert a Pillow image to a NumPy array that YOLO expects."""
    return np.array(img.convert("RGB"))

@app.post("/detect")
async def detect(file: UploadFile = File(...)):
    """Accept an image file and return bounding boxes."""
    if file.content_type not in {"image/jpeg", "image/png"}:
        raise HTTPException(status_code=400, detail="Invalid image type")

    # Read bytes and open with Pillow
    contents = await file.read()
    try:
        img = Image.open(io.BytesIO(contents))
    except Exception:
        raise HTTPException(status_code=400, detail="Corrupt image")

    # Run inference (async is not needed because YOLO runs on C++)
    results = model(pil_to_numpy(img))[0]

    # Build a simple JSON response
    detections = []
    for box in results.boxes:
        detections.append({
            "class_id": int(box.cls),
            "class_name": model.names[int(box.cls)],
            "confidence": float(box.conf),
            "bbox": box.xyxy.tolist()[0]  # [x1, y1, x2, y2] in pixel coords
        })

    return {"detections": detections}
Enter fullscreen mode Exit fullscreen mode

app/requirements.txt

fastapi[all]==0.110.*
uvicorn==0.27.*
ultralytics==8.2.*
pillow==10.2.*
numpy==1.26.*
Enter fullscreen mode Exit fullscreen mode

Dockerfile

# Use an official lightweight Python image
FROM python:3.11-slim

# Install system dependencies (opencv needed by ultralytics)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*

# Create a non‑root user
RUN useradd -m appuser
WORKDIR /app
COPY --chown=appuser:appuser app/ ./app/
COPY --chown=appuser:appuser weights/best.pt ./weights/best.pt

# Install Python deps
RUN pip install --no-cache-dir -r app/requirements.txt

# Switch to non‑root user
USER appuser

EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Enter fullscreen mode Exit fullscreen mode

Tip – Keep the weights/ directory next to the Dockerfile so the model file is added to the image at build time. For larger models you may want to mount the weights as a volume instead.

4. Docker‑Compose for one‑click launch

Create a docker-compose.yml at the repository root:

version: "3.9"

services:
  yolo-api:
    build: .
    ports:
      - "8000:8000"
    restart: unless-stopped
    # If you have a GPU and the host has NVIDIA drivers, uncomment:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - capabilities: [gpu]
Enter fullscreen mode Exit fullscreen mode

Now run:

docker compose up --build -d
Enter fullscreen mode Exit fullscreen mode

The API will be reachable at http://localhost:8000/detect. FastAPI automatically generates interactive docs at http://localhost:8000/docs, where you can upload an image and see the JSON response instantly.

5. Test the endpoint

A quick curl test:

curl -X POST "http://localhost:8000/detect" \
  -F "file=@sample.jpg" \
  -H "Accept: application/json"
Enter fullscreen mode Exit fullscreen mode

You should receive something like:

{
  "detections": [
    {
      "class_id": 0,
      "class_name": "person",
      "confidence": 0.92,
      "bbox": [112, 45, 398, 720]
    },
    {
      "class_id": 2,
      "class_name": "dog",
      "confidence": 0.78,
      "bbox": [410, 300, 620, 540]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

If you prefer a visual output, you can extend the API to return an image with drawn boxes using OpenCV or Pillow. The core logic stays the same; just add a cv2.rectangle loop and return StreamingResponse.

6. Scaling considerations

  • GPU acceleration: The Dockerfile above runs on CPU. To enable GPU, use the nvidia/cuda base image and add --gpus all to docker compose up. The Ultralytics package automatically detects CUDA.
  • Batch inference: For higher throughput, modify the endpoint to accept a list of images and call model(images) once. This reduces the overhead of model loading.
  • Model versioning: Store each trained checkpoint in a separate folder and mount the desired version at runtime (-v ./weights/v2.pt:/app/weights/best.pt). This makes A/B testing painless.

7. Clean up

When you’re done experimenting, stop and remove containers:

docker compose down
docker rmi $(docker images -q your_image_name)  # optional
Enter fullscreen mode Exit fullscreen mode

You can also push the image to a registry (Docker Hub, GitHub Packages, etc.) and deploy it to any cloud provider that supports containers—AWS ECS, GCP Cloud Run, or Azure Container Apps—all with the same docker run command.

Conclusion

You’ve just built a full‑stack, containerized object detection service:

  1. Data → YOLOv8 training → best.pt.
  2. FastAPI wraps the model in a clean HTTP endpoint.
  3. Docker guarantees reproducibility and portability.
  4. Docker‑Compose makes local development and testing a single command.

From here you can experiment with larger YOLO variants, add authentication to the API, or integrate the service into a larger micro‑service architecture. The sky’s the limit!

Key takeaways

  • YOLOv8’s CLI makes custom training fast; a single best.pt file is all you need for inference.
  • FastAPI provides async‑ready, auto‑documented endpoints that pair nicely with YOLO’s Python API.
  • Docker isolates dependencies (Python, OpenCV, CUDA) and ensures the same environment runs everywhere.
  • Using Docker‑Compose you can spin up the API locally, test it with curl or the Swagger UI, and later push the image to any container platform.

Top comments (0)