Om Prakash

Posted on May 11 • Originally published at pixelapi.dev

Detect Faces: Boxes, Landmarks, and Counts in One Call

#api #computervision #python #webdev

Detect Faces: Boxes, Landmarks, and Counts in One Call

If you've ever tried to ship a "crop to face" feature, a privacy blur before user uploads go public, or a simple head-count on event photos, you already know the pain. Most face-detection options out there are either overkill — bundled into a full recognition product you don't need — or so bare that you end up making a second call just to figure out where the eyes are. We built detect-faces to sit exactly in that gap.

What it does

POST /v1/image/detect-faces takes a public image URL and gives you back, for every face in the image:

A bounding box — the rectangle around the face, so you can crop, blur, or mask it.
Key landmarks — coordinates for the eyes, nose, and mouth, so you can centre crops, align portraits, or build downstream alignment logic without a second round trip.
A per-face confidence score, so you can tune precision vs recall for your use case.

The request itself is small. You send three fields:

image_url — a public URL of the image. Required.
min_confidence — a float between 0.0 and 1.0. Detections below this score are dropped. Defaults to 0.5, which is a sensible starting point for general photos.
include_landmarks — boolean. When true (the default), the response includes eye, nose, and mouth coordinates per face. Set it to false if you only need boxes and want a slightly tighter payload.

That's the whole API surface. No model selection, no resolution tier, no "advanced mode" toggle. Send a URL, get faces back. The endpoint is built for the boring, high-volume jobs developers actually do at scale — the kind of jobs where you don't want to think about anything except the result.

It's worth being clear about what this endpoint is not: it isn't a recognition endpoint. It doesn't try to identify who a face belongs to, match across photos, or estimate age or emotion. It's a detection primitive. The whole point is that it's a clean input into whatever pipeline you're building — cropping, blurring, counting, or feeding into our other endpoints for portrait or face-restore work.

Why we built it

We talked to a lot of teams building photo features, and the same shape of problem kept coming up. Someone needs to do something with a face — crop it, hide it, count it — and the only options are heavy SDKs that ship recognition by default, or smaller libraries that return a box and leave you to figure out the rest.

If all you want is a bounding box plus the landmarks needed to align a crop, you're paying for a lot of features you'll never use. And if you choose the cheaper, bare-bones detector, you end up writing your own landmark step or making a second API call — which kills the cost advantage you were chasing in the first place.

Our angle here is narrow on purpose. One endpoint, one job, both deliverables in one response. Bounding boxes for the people who just want to know where the faces are, and landmarks in the same payload for the people who need to align or centre a crop. No flag to enable an extra "premium" output. No second SKU. Same call, same price.

We also wanted this to be the cheapest detection endpoint we ship. Detection is a primitive — you should be able to run it on every image in your pipeline without doing pricing maths in your head. At 4 credits a call, you can.

Quickstart

The endpoint is a standard JSON POST. Here's the curl version — drop in your API key and an image URL and you're done:

curl -X POST https://api.pixelapi.dev/v1/image/detect-faces \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"image_url": "https://example.com/source.jpg", "include_landmarks": true}'

And the Python equivalent using requests. This is what you'd drop into a worker or a Flask/FastAPI handler:

import os
import requests

API_KEY = os.environ["PIXELAPI_KEY"]

def detect_faces(image_url, min_confidence=0.5, include_landmarks=True):
    response = requests.post(
        "https://api.pixelapi.dev/v1/image/detect-faces",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "image_url": image_url,
            "min_confidence": min_confidence,
            "include_landmarks": include_landmarks,
        },
        timeout=30,
    )
    response.raise_for_status()
    return response.json()


if __name__ == "__main__":
    faces = detect_faces("https://example.com/source.jpg")
    print(f"Detected {len(faces.get('faces', []))} face(s)")
    for i, face in enumerate(faces.get("faces", [])):
        print(f"  Face {i}: confidence={face.get('confidence')}, box={face.get('box')}")

A couple of practical notes if you're integrating this into a real backend:

Pull the API key from an environment variable, not from code. Boring advice, but it's the single most common mistake we see in early integrations.
Treat image_url as a fetch-from-public-internet operation on our side. Make sure the URL is actually reachable from outside your VPC — pre-signed S3 URLs work fine; private CDN paths won't.
Tune min_confidence per use case. For a "count people in this event photo" job, you might want to drop it to 0.3 so distant faces in a crowd aren't missed. For a "auto-crop a portrait" workflow, push it up to 0.7 so you don't centre on a random face-shaped object in the background.
Skip landmarks if you don't need them. Setting include_landmarks to false gives you a lighter response and is a small optimisation if you're calling this in a tight loop.

There's no async or webhook variant for this endpoint. Detection is fast enough that we keep it synchronous — your call blocks until you get the JSON back.

Use cases

We see three patterns come up over and over. They're not the only things you can build with this — but if you're new to the endpoint, these are good starting points.

Auto-crop group photos to centre on the largest face

Most photo apps eventually need a "smart thumbnail" feature. The trouble with naive centre-cropping is that the most important subject is almost never dead-centre in the frame — group shots especially put the main subject off to one side, with friends or background filling the rest. So you run detect-faces, pick the face with the largest bounding box (or the highest confidence, depending on your heuristic), and crop your thumbnail around that box plus some padding. Because the landmarks come back in the same response, you can go further — anchor the crop on the midpoint between the eyes instead of the box centre, which gives a much more natural-looking portrait crop. No second API call, no separate alignment step, just one POST and a bit of arithmetic on the response.

Privacy-blur faces in user uploads before public display

Anyone running a community feature with user-submitted photos eventually runs into the privacy question. Maybe it's a marketplace where buyers don't want their faces showing up in listings, or a forum where someone uploads a photo and there's a bystander in the background. The workflow is the same: run the upload through detect-faces, walk the array of boxes, and gaussian-blur each region before you save the public version. You can keep the original on your side for moderation, but only the blurred version ever hits your CDN. With landmarks turned on, you can do tighter privacy crops — for example, blurring only the eye region for a milder anonymisation — without separately locating where the eyes are. And because the call is cheap, you can afford to run it on every upload by default, not just on the ones a user flags.

Count people in event photos for analytics

Event organisers, conference platforms, and venue analytics teams all want the same number: how many people are in this photo. It's a surprisingly load-bearing metric — it feeds into engagement reports, sponsor decks, "footfall vs. last year" comparisons. The straightforward implementation is to send every event photo through detect-faces, count the items in the response, and store that count against the photo's metadata. You'll want to drop min_confidence for crowd shots so far-away faces still register, and you'll want to be honest about the fact that face count is a lower bound — people turned away from the camera won't be counted. But for relative comparisons across photos, it's a perfectly good signal, and you can run it across an entire event's photo set in a few minutes without it costing you much at all.

Pricing

detect-faces costs 4 credits per call, which works out to:

₹0.0027 per call (INR)
$0.00003 per call (USD)

That's the same price whether you ask for landmarks or not, and it's the cheapest detection endpoint we ship. The reasoning is simple: detection is a primitive, and primitives should be cheap enough that you don't think about them. At this price, putting detect-faces in front of every image in a user-upload pipeline is a rounding error on your infra bill, even at meaningful scale.

What you also get in the same call — and this is the bit that quietly matters — is the landmark output. On a lot of other detection products, "where are the eyes" is either a separate endpoint, a more expensive tier, or a flag that bumps the cost. With us, landmarks are included in the base price. So if your downstream code needs to align a crop or do a tighter privacy blur, you don't pay twice or call twice. One POST, one cost, both outputs.

A quick word on credits: we use a credit system so that the same API key works across all of our endpoints without you having to manage separate billing for each. Buying credits in bulk gets you a better effective rate, and you can monitor usage from the dashboard. If you're prototyping, the free credits on signup are more than enough to wire up an integration end to end and see real responses come back.

Try it

The fastest path is to grab a key from the dashboard, drop the curl command above into your terminal with a real image URL, and watch the JSON come back.

Dashboard and API keys: pixelapi.dev/dashboard
Full docs and the rest of our endpoints: pixelapi.dev/docs

If you build something with it — a smart-cropper, a privacy filter, an event-count dashboard — we'd genuinely like to hear about it. And if you hit something that's missing from the response payload or the request body for your use case, tell us. This endpoint is intentionally narrow, but it's narrow because we listened to what people actually wanted, not because we were trying to stop you doing things. Detection should be cheap, fast, and complete in one call. That's the whole pitch.

DEV Community

Detect Faces: Boxes, Landmarks, and Counts in One Call

Detect Faces: Boxes, Landmarks, and Counts in One Call

What it does

Why we built it

Quickstart

Use cases

Auto-crop group photos to centre on the largest face

Privacy-blur faces in user uploads before public display

Count people in event photos for analytics

Pricing

Try it

Top comments (0)