DEV Community

Cover image for How to Track OpenAI API spend per feature: a cost-attribution playbook
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Track OpenAI API spend per feature: a cost-attribution playbook

Your OpenAI invoice says you spent $4,237 last month. It does not tell you that $3,100 came from one runaway summarization endpoint, $700 came from a customer paying $50/month, and $437 came from a feature nobody uses. If you want pricing, capacity, or roadmap decisions to be grounded in data, you need request-level cost attribution.

Try Apidog today

This guide shows how to implement OpenAI API cost attribution in production: tag every request, log token usage and computed cost, aggregate spend by feature/route/customer, set budget caps, and test the wrapper before shipping.

💡 Apidog gives you the request-level visibility and scenario testing you need to verify your cost-tracking wrapper works before it ships to production. Use Apidog to replay tagged requests, assert log shape, and validate that every call carries the metadata your warehouse expects.

TL;DR

Implement this pipeline:

  1. Wrap every OpenAI API call.
  2. Require metadata: feature, route, customer_id, and environment.
  3. Capture response.usage.
  4. Compute cost_usd at write time.
  5. Emit one structured log event per request.
  6. Aggregate by tag in your warehouse.
  7. Set OpenAI project/key budget caps.
  8. Alert on hourly spend anomalies.
  9. Validate the wrapper with Apidog scenario tests.

Introduction

You ship a new AI feature on Tuesday. By Friday, your CFO asks why the OpenAI line item jumped 40%. The OpenAI dashboard shows total spend and model usage, but not which feature, customer, or endpoint caused the spike.

That is the core problem: OpenAI billing is useful for invoices, not engineering attribution.

The fix is straightforward:

  • Add metadata at the call site.
  • Log every request as structured data.
  • Compute cost from token usage.
  • Store the event in your warehouse.
  • Build dashboards and alerts from that table.

By the end of this guide, you will have:

  • A cost-attribution event schema
  • Python wrapper code
  • SQL aggregation queries
  • A verification workflow with Apidog
  • A build-vs-buy tooling comparison

For pricing context, see the GPT-5.5 pricing breakdown. For a related billing-attribution problem, see GitHub Copilot usage billing for API teams. For API basics, see the official OpenAI API reference.

Why OpenAI’s billing dashboard is not enough

The OpenAI billing dashboard typically gives you:

  • Daily spend
  • Model breakdown
  • Usage limits

That works for a simple setup. It breaks down when you have:

  • Multiple AI features
  • Multiple customers
  • Multiple environments
  • Multiple developers
  • Background jobs
  • Internal tools

What is missing

Total spend without context

The dashboard can tell you that you spent $312 yesterday. It cannot tell you whether that came from a customer hammering your support-chat endpoint or from a background job reprocessing your knowledge base.

No per-feature breakdown

OpenAI usage is grouped around account/project/model dimensions. It does not know your product concepts: feature, route, customer_id, or environment.

Reporting lag

Usage data may lag by tens of minutes or hours. That is too slow for runaway loops or hourly burn alerts.

No feature-level alerts

There is no native primitive for: “Page me if /api/v1/chat/answer exceeds $50/hour.”

No customer attribution

If you run B2B SaaS, you need to know which customer generated which spend. Without that, you cannot compute gross margin per customer.

Project keys help, but only partially

OpenAI project keys can separate workloads at a coarse level. They do not give you per-feature, per-route, or per-customer attribution. The OpenAI usage API returns aggregated data, not request-level product metadata.

The pattern is common enough that the Dev.to thread “OpenAI Tells You What You Spent. Not Where. So I Built a Dashboard” resonated with developers: you cannot manage what you cannot measure.

The cost-attribution data model

Treat every OpenAI request as a cost event. That event is the unit you query, alert on, and reconcile.

Use a schema like this:

Column Type Example Why it matters
request_id uuid 7a91... Idempotency, deduplication, retries
timestamp timestamptz 2026-05-06T14:23:01Z Time-series queries and anomaly detection
feature text support-chat Product surface that triggered the call
route text /api/v1/chat/answer HTTP route or background job ID
customer_id text cust_4291 Per-customer spend and gross margin
environment text prod, staging, dev Separate production from internal usage
model text gpt-5.5, gpt-5.4-mini Pricing differs per model
prompt_tokens int 15234 Input token count
completion_tokens int 812 Output token count
reasoning_tokens int 4500 Reasoning tokens billed as output
cached_tokens int 12000 Cached input tokens
latency_ms int 2341 Cost/performance correlation
cost_usd numeric(10,6) 0.045672 Cost computed at write time
prompt_cache_key text system-v3 Cache hit tracking
error_code text null, 429 Retry and failure analysis

Compute cost when you write the event, not later in a dashboard query. Pricing changes over time, so historical events should preserve the rate used at the time.

Example pricing function:

PRICING = {  # USD per 1M tokens, as of May 2026
    "gpt-5.5":      {"input": 5.00,  "cached": 2.50,  "output": 30.00},
    "gpt-5.5-pro":  {"input": 30.00, "cached": 15.00, "output": 180.00},
    "gpt-5.4":      {"input": 2.50,  "cached": 1.25, "output": 15.00},
    "gpt-5.4-mini": {"input": 0.25,  "cached": 0.125, "output": 2.00},
}

def compute_cost_usd(model, prompt_tokens, cached_tokens, completion_tokens, reasoning_tokens):
    rates = PRICING[model]

    uncached = max(0, prompt_tokens - cached_tokens)

    input_cost = (uncached * rates["input"]) / 1_000_000
    cache_cost = (cached_tokens * rates["cached"]) / 1_000_000
    output_cost = ((completion_tokens + reasoning_tokens) * rates["output"]) / 1_000_000

    return round(input_cost + cache_cost + output_cost, 6)
Enter fullscreen mode Exit fullscreen mode

Reasoning tokens are returned under:

usage.completion_tokens_details.reasoning_tokens
Enter fullscreen mode Exit fullscreen mode

They are billed at the output rate. If you omit them, you undercount cost for reasoning-heavy calls.

For more pricing details, see the GPT-5.5 pricing breakdown.

Wrap the OpenAI client

Every OpenAI call should go through one wrapper. The wrapper should:

  1. Require product metadata.
  2. Generate or receive a request_id.
  3. Call OpenAI.
  4. Capture token usage.
  5. Compute cost.
  6. Emit a structured event.
import time
import uuid
import json
import logging
from openai import OpenAI

client = OpenAI()
logger = logging.getLogger("llm.cost")

def call_with_attribution(
    *,
    feature,
    route,
    customer_id,
    environment,
    model,
    messages,
    request_id=None,
    **openai_kwargs
):
    if not feature or not route or not customer_id or not environment:
        raise ValueError("feature, route, customer_id, and environment are required")

    request_id = request_id or str(uuid.uuid4())
    started = time.time()
    error_code = None
    response = None

    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            **openai_kwargs
        )
        return response

    except Exception as e:
        error_code = getattr(e, "code", "unknown_error")
        raise

    finally:
        latency_ms = int((time.time() - started) * 1000)

        u = response.usage if response else None

        prompt_tokens = getattr(u, "prompt_tokens", 0) if u else 0
        completion_tokens = getattr(u, "completion_tokens", 0) if u else 0

        cached_tokens = (
            getattr(getattr(u, "prompt_tokens_details", None), "cached_tokens", 0)
            if u else 0
        ) or 0

        reasoning_tokens = (
            getattr(getattr(u, "completion_tokens_details", None), "reasoning_tokens", 0)
            if u else 0
        ) or 0

        cost_usd = compute_cost_usd(
            model,
            prompt_tokens,
            cached_tokens,
            completion_tokens,
            reasoning_tokens
        )

        logger.info(json.dumps({
            "event": "openai.request",
            "request_id": request_id,
            "feature": feature,
            "route": route,
            "customer_id": customer_id,
            "environment": environment,
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "reasoning_tokens": reasoning_tokens,
            "cached_tokens": cached_tokens,
            "latency_ms": latency_ms,
            "cost_usd": cost_usd,
            "error_code": error_code,
        }))
Enter fullscreen mode Exit fullscreen mode

Usage example:

response = call_with_attribution(
    feature="support-chat",
    route="/api/v1/chat/answer",
    customer_id="cust_4291",
    environment="prod",
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a support assistant."},
        {"role": "user", "content": "How do I reset my password?"}
    ],
)
Enter fullscreen mode Exit fullscreen mode

Ship these logs to your existing pipeline:

  • Vector
  • Fluent Bit
  • Logstash
  • OTLP collector
  • Kafka
  • Pub/Sub
  • NATS

Then write them into your warehouse:

  • BigQuery
  • ClickHouse
  • Snowflake
  • Postgres

For Node.js, use the same shape: a wrapper function around the OpenAI SDK that accepts metadata, captures response.usage, computes cost, and writes a JSON event.

Wire up cost tracking and test it with Apidog

1. Replace direct OpenAI calls

Search your codebase for direct SDK calls:

grep -R "client.chat.completions.create" .
grep -R "OpenAI(" .
Enter fullscreen mode Exit fullscreen mode

Replace every direct call with your attribution wrapper.

Do not default missing metadata to "unknown". Fail fast:

if not feature:
    raise ValueError("feature is required")
Enter fullscreen mode Exit fullscreen mode

Bad tags create silent attribution errors.

2. Emit structured logs

Log one JSON event per request:

{
  "event": "openai.request",
  "request_id": "7a91...",
  "feature": "support-chat",
  "route": "/api/v1/chat/answer",
  "customer_id": "cust_4291",
  "environment": "prod",
  "model": "gpt-5.5",
  "prompt_tokens": 15234,
  "completion_tokens": 812,
  "reasoning_tokens": 4500,
  "cached_tokens": 12000,
  "latency_ms": 2341,
  "cost_usd": 0.045672,
  "error_code": null
}
Enter fullscreen mode Exit fullscreen mode

Keep these events clean. Do not mix them with debug logs.

3. Aggregate spend in SQL

Once events are in your warehouse, start with feature-level spend:

SELECT
  feature,
  DATE_TRUNC(timestamp, DAY) AS day,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  SUM(prompt_tokens + completion_tokens + reasoning_tokens) AS tokens,
  AVG(latency_ms) AS avg_latency_ms,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY feature, day
ORDER BY day DESC, spend_usd DESC;
Enter fullscreen mode Exit fullscreen mode

Then add customer-level spend:

SELECT
  customer_id,
  DATE_TRUNC(timestamp, MONTH) AS month,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd
FROM openai_events
WHERE environment = 'prod'
GROUP BY customer_id, month
ORDER BY spend_usd DESC;
Enter fullscreen mode Exit fullscreen mode

And route-level spend:

SELECT
  route,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  AVG(cost_usd) AS avg_cost_per_request
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY route
ORDER BY spend_usd DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

4. Build the dashboard

Create three operational views:

  • Spend per feature over time
  • Spend per customer over time
  • Top routes by daily spend

Use whatever BI layer you already have:

  • Grafana
  • Metabase
  • Looker
  • Superset
  • Mode

5. Test the wrapper with Apidog

Before shipping, verify that the wrapper logs the metadata you expect.

Use Apidog to create an end-to-end scenario:

  1. Send a request to your AI endpoint with a known customer_id.
  2. Verify the API response succeeds.
  3. Capture the side-channel log event through your logging endpoint, stdout collector, or OTLP/log pipeline.
  4. Assert the event contains:
    • feature
    • route
    • customer_id
    • environment
    • model
    • prompt_tokens > 0
    • cost_usd > 0
  5. Run the same scenario against staging and production using Apidog environments.
  6. Replay the request and verify retries do not double-count cost.

For broader testing workflows, see API testing tools for QA engineers. For contract-first coverage, see contract-first API development.

6. Set budget caps and alerts

Use OpenAI project keys to isolate risk:

  • prod-support-chat
  • prod-summarization
  • staging-all
  • dev-all

Set hard caps in the OpenAI dashboard so one runaway workload cannot drain the whole organization budget.

Then add warehouse-driven alerts. Example: page if any feature exceeds 3x its seven-day average hourly spend.

WITH hourly AS (
  SELECT
    feature,
    TIMESTAMP_TRUNC(timestamp, HOUR) AS hour,
    SUM(cost_usd) AS spend_usd
  FROM openai_events
  WHERE environment = 'prod'
    AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY)
  GROUP BY feature, hour
),
baseline AS (
  SELECT
    feature,
    AVG(spend_usd) AS avg_hourly_spend
  FROM hourly
  WHERE hour < TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR), INTERVAL 1 HOUR)
  GROUP BY feature
),
current_hour AS (
  SELECT
    feature,
    spend_usd
  FROM hourly
  WHERE hour = TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR)
)
SELECT
  c.feature,
  c.spend_usd,
  b.avg_hourly_spend
FROM current_hour c
JOIN baseline b USING (feature)
WHERE c.spend_usd > b.avg_hourly_spend * 3;
Enter fullscreen mode Exit fullscreen mode

Send the result to:

  • PagerDuty
  • Opsgenie
  • Slack
  • Email
  • Incident.io

Native caps protect you from catastrophic burn. Warehouse alerts catch slow drift earlier.

Advanced techniques

Prompt caching

GPT-5.5 charges less for cached input tokens. Structure prompts so stable content appears first:

[Stable system prompt]
[Stable policy/instructions]
[Stable examples]
[Per-request user data]
Enter fullscreen mode Exit fullscreen mode

Track this per feature:

SELECT
  feature,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY cache_hit_rate ASC;
Enter fullscreen mode Exit fullscreen mode

If a prompt change drops cache hit rate, your input cost can rise silently.

See the official OpenAI prompt caching docs for eligibility rules.

Batch API for offline workloads

Use the Batch API for workloads that do not need synchronous responses:

  • Nightly summarization
  • Evaluation runs
  • Embedding backfills
  • Document re-processing

Tag these events with a batch_job_id so you can attribute cost back to the source workload.

Reasoning effort tuning

Reasoning-heavy calls can multiply output tokens. Audit features that use higher reasoning effort:

  • Can medium become low?
  • Does quality remain acceptable?
  • What is the cost delta?

Track cost and quality side by side before changing production defaults.

For more details, see how to use the GPT-5.5 API.

Context-window discipline

Long prompts are expensive. Prefer tight retrieval over stuffing large context windows.

Track prompt size by feature:

SELECT
  feature,
  AVG(prompt_tokens) AS avg_prompt_tokens,
  APPROX_QUANTILES(prompt_tokens, 100)[OFFSET(95)] AS p95_prompt_tokens
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY p95_prompt_tokens DESC;
Enter fullscreen mode Exit fullscreen mode

If prompt size grows without a product reason, investigate.

Watch the 272K-token cliff

OpenAI applies higher pricing on GPT-5.5 requests above 272K tokens. Add a guardrail:

if prompt_tokens > 250_000:
    logger.warning(json.dumps({
        "event": "openai.prompt_size_warning",
        "request_id": request_id,
        "feature": feature,
        "route": route,
        "customer_id": customer_id,
        "prompt_tokens": prompt_tokens,
    }))
Enter fullscreen mode Exit fullscreen mode

For pricing details, see the GPT-5.5 pricing post.

Per-customer spend caps

For B2B SaaS, enforce spend limits before making the OpenAI call.

Example flow:

  1. Query current monthly spend for customer_id.
  2. Compare it to the customer’s quota.
  3. If under quota, call OpenAI.
  4. If over quota, return 429.

Example response:

{
  "error": "monthly_ai_quota_exceeded",
  "message": "Your monthly AI quota has been exceeded. Upgrade your plan or contact billing."
}
Enter fullscreen mode Exit fullscreen mode

This turns AI from a margin risk into a controllable product cost.

Common mistakes

Avoid these:

  • Counting reasoning tokens as input. They are output.
  • Trusting the OpenAI dashboard for real-time alerts.
  • Adding tags globally instead of at the call site.
  • Forgetting background jobs and queue workers.
  • Sampling logs. Log every request.
  • Allowing customer_id to be null.
  • Computing historical cost with today’s pricing.
  • Retrying successful requests with a new request_id.

For background jobs, use synthetic routes:

cron:nightly-summarize
queue:image-caption
webhook:crm-sync
Enter fullscreen mode Exit fullscreen mode

For unknown internal usage, use explicit values:

customer_id = "internal"
customer_id = "system"
Enter fullscreen mode Exit fullscreen mode

Never use null as an attribution bucket.

Alternatives and tooling

You do not have to build all of this yourself.

Approach What it does well What it costs When to use
OpenAI usage API Native, no setup, accurate to the cent Free One project, one feature, no per-customer attribution
Helicone Drop-in proxy, dashboards, caching, per-user costs Free tier; paid from $20/mo You want a hosted dashboard quickly and accept a proxy
Langfuse Open source, self-host or cloud, traces plus cost Free self-hosted; cloud from $29/mo You want traces and cost in one tool
LangSmith LangChain integration, evals, cost tracking Paid from $39/user/mo You already use LangChain heavily
Custom warehouse Full control, no proxy, custom dimensions Engineering time Large workloads, strict residency, custom attribution

Tradeoffs:

  • A proxy adds another hop in the critical path.
  • A self-hosted observability stack gives control but adds ops work.
  • A custom warehouse integrates well with your data stack but requires you to own queries and alerts.
  • The native usage API is useful for reconciliation, not product-level attribution.

For more on hosted LLM cost monitoring, see Helicone’s guide on tracking LLM costs. For open-source cost tracking, see the Langfuse cost tracking docs.

If you operate at platform scale, these patterns also fit service-mesh and platform-engineering workflows. See API platforms for microservices architecture.

Real-world use cases

B2B SaaS with per-customer LLM spend

A sales-intelligence product spends $80,000/month on OpenAI. After adding per-customer attribution, the team learns that 12% of customers drive 71% of AI spend.

The company can then:

  • Add tiered pricing
  • Apply soft quotas to lower tiers
  • Charge overages
  • Improve gross margin per account

Internal developer tooling

An engineering org gives developers access to an internal GPT-5.5 assistant. By tagging requests with developer identity, platform engineering sees that three developers account for 50% of internal spend.

Two are running abandoned agent loops. Turning them off saves $1,800/month. The third is doing legitimate high-value work, so the team increases their quota.

AI feature forecasting

A product team wants to ship summarization. Historical events give them:

  • Average input tokens per call
  • Average output tokens per call
  • Calls per active user
  • Active user forecast

They estimate cost at $0.04 per active user per day, or about $1.20/month. Pricing can then set a $5/month feature price with visible unit economics.

Conclusion

OpenAI’s billing dashboard answers an accounting question. Request-level attribution answers the engineering and product question: where is the money going?

Implementation checklist:

  • Tag every request with feature, route, customer_id, and environment.
  • Compute cost at write time.
  • Log every request as structured data.
  • Store events in your warehouse.
  • Build feature, route, and customer dashboards.
  • Set OpenAI project/key caps.
  • Add warehouse-driven anomaly alerts.
  • Test the wrapper with Apidog.
  • Audit reasoning effort, prompt size, and cache hit rate regularly.

Download Apidog and use it to verify your cost-attribution wrapper end to end. Drive AI endpoints with tagged requests, assert the log payload shape, and replay scenarios across environments before your warehouse depends on the data.

For related cost-management reading, see the GPT-5.5 pricing breakdown and GitHub Copilot usage billing for API teams.

FAQ

Do reasoning tokens count as input or output for billing?

Reasoning tokens are billed at the output rate. The OpenAI API returns them under:

usage.completion_tokens_details.reasoning_tokens
Enter fullscreen mode Exit fullscreen mode

Add them to completion_tokens when computing cost. For per-effort pricing details, see the GPT-5.5 pricing breakdown.

How accurate is response.usage compared to the OpenAI dashboard?

Token counts in response.usage should match dashboard usage. Cost drift usually comes from stale pricing tables. Pin your rate table per model and update it when OpenAI changes pricing.

Can I do attribution with OpenAI project keys alone?

Only partially. Project keys give you one dimension of attribution. They do not give you per-feature, per-customer, or per-route visibility. Use project keys for isolation and budget caps; use application metadata for product attribution.

What about retries and rate-limit errors?

If a request fails before the model runs, there is no usage object and no cost to log.

If a request succeeds and your app retries it, you can double-count unless you reuse the same request_id and dedupe on write.

How fast does the OpenAI usage API return data?

The usage API can lag by tens of minutes. Use it for reconciliation. Use your own event stream and warehouse for alerts and kill switches.

Should I sample requests?

No. One JSON line per request is small, and sampling breaks customer and route attribution. Log every request.

Can this work for other LLM providers?

Yes. Add a provider column:

openai
anthropic
google
deepseek
Enter fullscreen mode Exit fullscreen mode

Then maintain provider-specific pricing logic. The warehouse schema and dashboards can stay mostly the same.

For a comparison point, see DeepSeek V4 API pricing.

Does this work for embeddings and image generation?

Yes, but the cost math changes.

Add an endpoint column:

chat
embeddings
image
Enter fullscreen mode Exit fullscreen mode

Then branch cost computation by endpoint. Embeddings are usually billed per input token. Images are usually billed per image or resolution.

Top comments (0)