Introduction
Object detection is one of the most exciting use‑cases of computer vision, and the YOLO (You Only Look Once) family has become the go‑to solution for real‑time inference. In this tutorial you’ll learn how to:
- Train a custom YOLOv8 model on your own dataset.
- Wrap the model in a FastAPI service that accepts image uploads and returns detections instantly.
- Containerize the whole stack with Docker so it runs the same everywhere.
- Automate testing and deployment using a GitHub Actions CI/CD pipeline.
By the end you’ll have a production‑ready API that can be deployed to any container host (AWS ECS, GCP Cloud Run, Azure Container Apps, or even your laptop).
Tip: If you’re new to YOLOv8, the official Ultralytics repo ships with a very friendly CLI that handles most of the heavy lifting. We’ll use it as the foundation and then add a thin FastAPI wrapper around the exported model.
1. Preparing the data and training YOLOv8
1.1 Organize your dataset
YOLO expects the following directory layout:
dataset/
├── images/
│ ├── train/
│ └── val/
└── labels/
├── train/
└── val/
- Images can be JPEG or PNG.
-
Labels are text files with the same base name as the image, each line containing
class_id x_center y_center width height(all normalized to[0,1]).
If you have data in COCO or Pascal VOC format, the ultralytics package can convert it automatically:
pip install ultralytics
yolo convert data.yaml --format coco # or --format voc
1.2 Create a data.yaml file
train: ./dataset/images/train
val: ./dataset/images/val
nc: 3 # number of classes
names: ['person', 'bicycle', 'dog']
1.3 Train the model
The simplest way is to use the CLI:
yolo task=detect mode=train data=./data.yaml epochs=50 imgsz=640 batch=16 model=yolov8n.pt
-
yolov8n.ptis the nano version, perfect for low‑latency inference. - Adjust
epochs,batch, andimgszto fit your compute budget.
The training script will create a runs/detect/train/weights/best.pt file – this is the model we’ll serve.
1.4 Quick sanity check
yolo task=detect mode=val model=./runs/detect/train/weights/best.pt data=./data.yaml
You should see a summary of mAP, precision, recall, and a few sample images with bounding boxes saved under runs/detect/val/predict.
2. Exporting the model for inference
YOLOv8 can export to several formats (torchscript, ONNX, TensorRT). For a FastAPI service running on CPU or GPU, the native PyTorch format works fine, but we’ll also export to ONNX for future flexibility.
yolo export model=./runs/detect/train/weights/best.pt format=onnx opset=12
You’ll get best.onnx in the same folder. Keep both best.pt and best.onnx – the former is useful for quick local testing, the latter for edge deployments.
3. Building the FastAPI wrapper
Create a new folder called api/ and add the following files.
3.1 requirements.txt
fastapi==0.110.0
uvicorn[standard]==0.27.0
python-multipart==0.0.9
torch==2.2.0
opencv-python-headless==4.9.0.80
ultralytics==8.2.0
pydantic==2.6.1
3.2 app.py
from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.responses import JSONResponse
from pathlib import Path
import io
import cv2
import torch
import numpy as np
app = FastAPI(title="YOLOv8 Object Detection API")
# Load the model once at startup
MODEL_PATH = Path(__file__).parent / "best.pt"
if not MODEL_PATH.exists():
raise FileNotFoundError(f"Model not found at {MODEL_PATH}")
model = torch.hub.load('ultralytics/yolov5', 'custom', path=str(MODEL_PATH), force_reload=True)
model.eval()
def read_image(file: UploadFile) -> np.ndarray:
"""Convert uploaded file to a OpenCV BGR image."""
contents = file.file.read()
np_arr = np.frombuffer(contents, np.uint8)
img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)
if img is None:
raise HTTPException(status_code=400, detail="Invalid image")
return img
@app.post("/detect")
async def detect(file: UploadFile = File(...)):
"""Accept an image and return YOLO detections."""
img = read_image(file)
results = model(img) # Inference
detections = results.pandas().xyxy[0] # Pandas DataFrame
# Convert to JSON‑serializable dict
output = detections.to_dict(orient="records")
return JSONResponse(content=output)
Explanation of key parts
-
Model loading: We use
torch.hub.loadwith thecustomflag to load ourbest.pt. This runs on CPU by default; adddevice='cuda'if you have a GPU. -
Image handling:
UploadFilegives us a file‑like object. We decode it with OpenCV so the model receives a NumPy array. -
Result formatting: YOLO returns a
Resultsobject. Thepandas().xyxy[0]view gives us a tidy DataFrame with columnsxmin, ymin, xmax, ymax, confidence, class, name. Converting it to a list of dicts makes the API response clean.
3.3 Run locally
pip install -r api/requirements.txt
uvicorn api.app:app --host 0.0.0.0 --port 8000
Visit http://localhost:8000/docs – FastAPI automatically generates an interactive Swagger UI. Try uploading a picture and you should receive a JSON array of detections.
4. Dockerizing the service
Create a Dockerfile at the project root:
# Use the official lightweight Python image
FROM python:3.11-slim
# Install system dependencies (opencv needs libgl1)
RUN apt-get update && apt-get install -y --no-install-recommends \
libgl1 && \
rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy only requirements first for layer caching
COPY api/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the code and the trained model
COPY api/ ./api/
COPY runs/detect/train/weights/best.pt ./api/
# Expose FastAPI port
EXPOSE 8000
# Command to run the service
CMD ["uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8000"]
4.1 Build and test the image
docker build -t yolov8-api:latest .
docker run -p 8000:8000 yolov8-api:latest
Again, navigate to http://localhost:8000/docs to verify the container works.
5. CI/CD with GitHub Actions
Having the Docker image build automatically on every push guarantees reproducibility. Add the following workflow file at .github/workflows/docker-ci.yml:
name: CI / CD for YOLOv8 FastAPI
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU (for multi‑arch builds)
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ secrets.DOCKERHUB_USERNAME }}/yolov8-api:latest
cache-from: type=registry,ref=${{ secrets.DOCKERHUB_USERNAME }}/yolov8-api:cache
cache-to: type=registry,ref=${{ secrets.DOCKERHUB_USERNAME }}/yolov8-api:cache,mode=max
What this does
- Checks out the code on the GitHub runner.
- Enables multi‑architecture builds (useful if you later target ARM devices).
- Authenticates with Docker Hub using encrypted repository secrets.
- Builds the image and pushes it to Docker Hub under your namespace.
- Caches layers to speed up subsequent builds.
You can now trigger a deployment on any platform that can pull from Docker Hub (e.g., a simple docker run on an EC2 instance or a Kubernetes pod).
6. Optional: Deploying to a cloud provider
6.1 AWS Elastic Container Service (ECS) – Fargate
# 1. Create a cluster
aws ecs create-cluster --cluster-name yolov8-cluster
# 2. Register task definition (task-def.json)
aws ecs register-task-definition --cli-input-json file://task-def.json
# 3. Run service
aws ecs create-service \
--cluster yolov8-cluster \
--service-name yolov8-service \
--task-definition yolov8-task \
--desired-count 1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxxx],securityGroups=[sg-xxxx],assignPublicIp=ENABLED}"
The task-def.json would reference the image you pushed (youruser/yolov8-api:latest) and expose port 8000. After a few minutes the service is reachable via the load balancer URL.
6.2 Google Cloud Run
gcloud run deploy yolov8-api \
--image=gcr.io/<PROJECT_ID>/yolov8-api:latest \
--platform=managed \
--region=us-central1 \
--allow-unauthenticated \
--port=8000
Both platforms automatically handle scaling, health checks, and HTTPS termination, leaving you with a low‑maintenance API.
7. Testing the live endpoint
You can use curl or a small Python script:
import requests
url = "http://<host>:8000/detect"
files = {"file": open("test.jpg", "rb")}
resp = requests.post(url, files=files)
print(resp.json())
The response will be a list of detections, each containing:
{
"xmin": 124,
"ymin": 87,
"xmax": 342,
"ymax": 276,
"confidence": 0.93,
"class": 0,
"name": "person"
}
You can now feed this output into downstream services—tracking, alerting, or even a front‑end UI that draws boxes in real time.
Key takeaways
- Train a custom
Top comments (0)