Why combine YOLOv8 and FastAPI?
Object detection is at the heart of many modern applications—think smart cameras, inventory robots, or AR experiences. YOLOv8 (You Only Look Once) gives you state‑of‑the‑art accuracy while still running fast enough for real‑time use. FastAPI, on the other hand, is a lightweight, async‑first web framework that makes it trivial to expose a model as a REST endpoint.
In this tutorial you’ll walk through:
- Preparing a small custom dataset and training a YOLOv8 model.
- Wrapping the model in a FastAPI service that accepts images and returns detections.
- Docker‑izing the whole stack so it can run anywhere with a single
docker compose up.
By the end you’ll have a reproducible, container‑based API that can serve predictions in a few milliseconds.
Prerequisites
| Tool | Version | Why |
|---|---|---|
| Python | 3.9‑3.11 | Compatibility with Ultralytics YOLO |
| Ultralytics YOLO | pip install ultralytics |
Training and inference |
| FastAPI | pip install fastapi[all] |
HTTP server |
| Docker & Docker‑Compose | Latest | Container orchestration |
| Git | Any | Version control (optional) |
You’ll also need a modest GPU for training (even a laptop GPU works for a small dataset). If you only want to test inference, CPU‑only mode is fine.
1. Prepare a custom dataset
YOLOv8 expects the classic folder layout:
my_dataset/
├── images/
│ ├── train/
│ └── val/
└── labels/
├── train/
└── val/
Each image in train/ or val/ has a corresponding .txt file in the same sub‑folder under labels/. The label file contains one line per object:
<class_id> <x_center> <y_center> <width> <height>
All coordinates are normalized (0‑1). If you already have COCO‑style annotations, the ultralytics package can convert them:
# convert_coco_to_yolo.py
from ultralytics import YOLO
YOLO.convert_coco(
data="coco_annotations.json",
save_dir="my_dataset"
)
Once you have the folder ready, create a data.yaml that points to it:
# data.yaml
train: ./my_dataset/images/train
val: ./my_dataset/images/val
nc: 3 # number of classes
names: ['person', 'bicycle', 'dog']
2. Train the model
Training with YOLOv8 is a single line:
yolo task=detect mode=train data=data.yaml epochs=50 imgsz=640 batch=16 model=yolov8n.pt
-
yolov8n.ptis the nano version (fastest, smallest). Swap foryolov8s.ptor larger if you need higher accuracy. - Adjust
epochs,batch, andimgszto fit your hardware.
After training finishes you’ll find the best checkpoint in runs/detect/train/weights/best.pt. Keep that file; it’s what the API will load.
3. Build the FastAPI inference service
Create a new folder api/ and add the following files.
app/main.py
# app/main.py
import io
import numpy as np
from fastapi import FastAPI, File, UploadFile, HTTPException
from ultralytics import YOLO
from PIL import Image
app = FastAPI(title="YOLOv8 Object Detection API")
# Load the model once at startup
model = YOLO("weights/best.pt")
def pil_to_numpy(img: Image.Image) -> np.ndarray:
"""Convert a Pillow image to a NumPy array that YOLO expects."""
return np.array(img.convert("RGB"))
@app.post("/detect")
async def detect(file: UploadFile = File(...)):
"""Accept an image file and return bounding boxes."""
if file.content_type not in {"image/jpeg", "image/png"}:
raise HTTPException(status_code=400, detail="Invalid image type")
# Read bytes and open with Pillow
contents = await file.read()
try:
img = Image.open(io.BytesIO(contents))
except Exception:
raise HTTPException(status_code=400, detail="Corrupt image")
# Run inference (async is not needed because YOLO runs on C++)
results = model(pil_to_numpy(img))[0]
# Build a simple JSON response
detections = []
for box in results.boxes:
detections.append({
"class_id": int(box.cls),
"class_name": model.names[int(box.cls)],
"confidence": float(box.conf),
"bbox": box.xyxy.tolist()[0] # [x1, y1, x2, y2] in pixel coords
})
return {"detections": detections}
app/requirements.txt
fastapi[all]==0.110.*
uvicorn==0.27.*
ultralytics==8.2.*
pillow==10.2.*
numpy==1.26.*
Dockerfile
# Use an official lightweight Python image
FROM python:3.11-slim
# Install system dependencies (opencv needed by ultralytics)
RUN apt-get update && apt-get install -y --no-install-recommends \
libgl1-mesa-glx \
&& rm -rf /var/lib/apt/lists/*
# Create a non‑root user
RUN useradd -m appuser
WORKDIR /app
COPY --chown=appuser:appuser app/ ./app/
COPY --chown=appuser:appuser weights/best.pt ./weights/best.pt
# Install Python deps
RUN pip install --no-cache-dir -r app/requirements.txt
# Switch to non‑root user
USER appuser
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Tip – Keep the
weights/directory next to the Dockerfile so the model file is added to the image at build time. For larger models you may want to mount the weights as a volume instead.
4. Docker‑Compose for one‑click launch
Create a docker-compose.yml at the repository root:
version: "3.9"
services:
yolo-api:
build: .
ports:
- "8000:8000"
restart: unless-stopped
# If you have a GPU and the host has NVIDIA drivers, uncomment:
# deploy:
# resources:
# reservations:
# devices:
# - capabilities: [gpu]
Now run:
docker compose up --build -d
The API will be reachable at http://localhost:8000/detect. FastAPI automatically generates interactive docs at http://localhost:8000/docs, where you can upload an image and see the JSON response instantly.
5. Test the endpoint
A quick curl test:
curl -X POST "http://localhost:8000/detect" \
-F "file=@sample.jpg" \
-H "Accept: application/json"
You should receive something like:
{
"detections": [
{
"class_id": 0,
"class_name": "person",
"confidence": 0.92,
"bbox": [112, 45, 398, 720]
},
{
"class_id": 2,
"class_name": "dog",
"confidence": 0.78,
"bbox": [410, 300, 620, 540]
}
]
}
If you prefer a visual output, you can extend the API to return an image with drawn boxes using OpenCV or Pillow. The core logic stays the same; just add a cv2.rectangle loop and return StreamingResponse.
6. Scaling considerations
-
GPU acceleration: The Dockerfile above runs on CPU. To enable GPU, use the
nvidia/cudabase image and add--gpus alltodocker compose up. The Ultralytics package automatically detects CUDA. -
Batch inference: For higher throughput, modify the endpoint to accept a list of images and call
model(images)once. This reduces the overhead of model loading. -
Model versioning: Store each trained checkpoint in a separate folder and mount the desired version at runtime (
-v ./weights/v2.pt:/app/weights/best.pt). This makes A/B testing painless.
7. Clean up
When you’re done experimenting, stop and remove containers:
docker compose down
docker rmi $(docker images -q your_image_name) # optional
You can also push the image to a registry (Docker Hub, GitHub Packages, etc.) and deploy it to any cloud provider that supports containers—AWS ECS, GCP Cloud Run, or Azure Container Apps—all with the same docker run command.
Conclusion
You’ve just built a full‑stack, containerized object detection service:
-
Data → YOLOv8 training →
best.pt. - FastAPI wraps the model in a clean HTTP endpoint.
- Docker guarantees reproducibility and portability.
- Docker‑Compose makes local development and testing a single command.
From here you can experiment with larger YOLO variants, add authentication to the API, or integrate the service into a larger micro‑service architecture. The sky’s the limit!
Key takeaways
- YOLOv8’s CLI makes custom training fast; a single
best.ptfile is all you need for inference. - FastAPI provides async‑ready, auto‑documented endpoints that pair nicely with YOLO’s Python API.
- Docker isolates dependencies (Python, OpenCV, CUDA) and ensures the same environment runs everywhere.
- Using Docker‑Compose you can spin up the API locally, test it with
curlor the Swagger UI, and later push the image to any container platform.
Top comments (0)