DEV Community: tutorial

Git[깃] 초보자를 위한 필수 명령어 가이드

바람의평온 — Tue, 12 May 2026 02:40:13 +0000

## Git[깃]이란 무엇인가?

안녕하세요! 이번 강좌에서는 개발자라면 누구나 사용하게 되는 필수 도구, Git[깃]의 기본 명령어들을 알아보겠습니다. Git[깃]은 코드의 변경 이력을 관리하고 여러 개발자가 함께 작업할 때 발생하는 충돌을 최소화하는 강력한 버전 관리 시스템입니다. 특히 협업이 중요한 현대 개발 환경에서는 Git[깃]을 제대로 이해하고 사용하는 것이 필수적입니다. 이 강좌를 통해 Git[깃]의 핵심 명령어들을 쉽고 빠르게 익혀보세요.

## Git[깃] 저장소 초기화

새로운 프로젝트를 시작하거나 기존 프로젝트에 Git[깃]을 적용하고 싶을 때 가장 먼저 해야 할 일은 Git[깃] 저장소를 초기화하는 것입니다. 터미널에서 프로젝트 폴더로 이동한 후, git init[깃 이닛] 명령어를 실행하면 됩니다. 이 명령어는 현재 폴더에 Git[깃]을 위한 .git[깃]이라는 숨김 폴더를 생성하고, 버전 관리를 시작할 준비를 마칩니다. 이제부터 이 폴더 안의 모든 변경 사항이 Git[깃]에 의해 추적됩니다.

## 작업 상태 확인하기

코드를 작성하거나 파일을 수정하면 Git[깃]은 이를 감지합니다. 현재 어떤 파일들이 변경되었고, Git[깃]이 추적하고 있는지 확인하려면 git status[깃 스테이터스] 명령어를 사용합니다. 이 명령어는 아직 Git[깃]이 관리하지 않는 파일(Untracked[언트랙티드] 파일)과 Git[깃]이 추적 중이지만 변경된 파일(Modified[모디파이드] 파일)을 명확하게 보여줍니다. 변경 사항을 커밋하기 전에 반드시 상태를 확인하는 습관을 들이는 것이 좋습니다.

## 변경 사항 스테이징하기

Git[깃]은 변경된 모든 내용을 한 번에 커밋하는 것이 아니라, 커밋할 파일들을 미리 선택하는 과정을 거칩니다. 이 과정을 '스테이징'이라고 합니다. git add . [깃 에드] 명령어는 현재 디렉토리의 모든 변경된 파일들을 스테이징 영역으로 추가합니다. 특정 파일만 추가하고 싶다면 git add [파일명]과 같이 파일명을 지정할 수도 있습니다. 스테이징된 파일들만 다음 커밋에 포함됩니다.

## 변경 사항 커밋하기

스테이징된 파일들을 모아 하나의 커밋으로 기록하는 단계입니다. git commit -m "[커밋 메시지]" 명령어를 사용합니다. 여기서 -m 옵션 뒤에 오는 메시지는 해당 커밋이 어떤 변경 사항을 담고 있는지 설명하는 글입니다. 나중에 변경 이력을 볼 때 이 메시지를 보고 내용을 파악하게 되므로, 명확하고 간결하게 작성하는 것이 매우 중요합니다. 예를 들어, 'Initial commit[이니셜 커밋]'은 프로젝트 초기 상태를 의미합니다.

## 원격 저장소로 푸시하기

로컬 컴퓨터에서 작업한 내용을 GitHub[깃허브]와 같은 원격 저장소로 업로드하는 명령어입니다. git push [깃 푸시] [원격 저장소명] [브랜치명] 형식으로 사용합니다. 일반적으로 origin[오리진]은 기본 원격 저장소를, main[메인]은 기본 브랜치를 의미합니다. 이 명령어를 실행하면 로컬에서 커밋한 내용들이 원격 저장소에 반영되어 다른 사람들과 공유하거나 백업할 수 있습니다.

## 원격 저장소 내용 가져오기

다른 팀원이 원격 저장소에 푸시한 최신 변경 사항을 내 로컬 저장소로 가져오는 명령어입니다. git pull [깃 풀] [원격 저장소명] [브랜치명] 형식으로 사용합니다. git pull[깃 풀]은 내부적으로 git fetch[깃 패치]와 git merge[깃 머지]를 함께 수행하여 원격 저장소의 변경 내용을 가져온 후 현재 작업 중인 브랜치에 병합합니다. 협업 시에는 주기적으로 pull[풀]을 받아 최신 상태를 유지하는 것이 중요합니다.

## Git[깃] 명령어 요약

지금까지 Git[깃]의 기본적인 명령어들을 살펴보았습니다. git init[깃 이닛]으로 저장소를 만들고, git status[깃 스테이터스]로 변경 사항을 확인한 뒤, git add . [깃 에드]로 커밋할 내용을 선택하고, git commit -m "메시지"로 변경 내역을 기록했습니다. 마지막으로 git push[깃 푸시]와 git pull[깃 풀] 명령어를 통해 원격 저장소와 로컬 저장소를 동기화하는 방법까지 익혔습니다. 이 명령어들을 꾸준히 연습하면 Git[깃] 사용에 자신감이 생길 것입니다.

Cómo Rastrear el Gasto de la API de OpenAI por Función: Guía de Atribución de Costos

Roobia — Tue, 12 May 2026 02:39:34 +0000

Tu factura de OpenAI dice que gastaste $4,237 el mes pasado. No te dice que $3,100 vinieron de un único endpoint de resumen descontrolado, $700 de un cliente que paga $50 al mes y $437 de una función que nadie usa. El panel de control oculta la información que necesitas para decidir precios, capacidad y roadmap.

Prueba Apidog hoy

Esta guía muestra cómo implementar atribución de costos para la API de OpenAI: etiquetar cada solicitud con metadatos, calcular el gasto por función, ruta y cliente, configurar límites de presupuesto por clave y diseñar prompts para que el costo deje de ser una línea opaca.

💡 Apidog te ayuda a validar solicitudes etiquetadas antes de producción. Úsalo para reproducir llamadas, verificar la forma del log y comprobar que cada request incluye los metadatos que tu almacén espera.

En resumen

Etiqueta cada llamada a la API de OpenAI con metadatos estructurados:

feature
route
customer_id
environment
model

Después, emite un log estructurado por solicitud con tokens, latencia y costo calculado. Agrega esos eventos en tu almacén de datos y consulta el gasto por función, cliente o ruta.

También deberías:

configurar límites de presupuesto por clave en OpenAI;
crear alertas sobre anomalías de gasto por hora;
validar el wrapper extremo a extremo con Apidog.

Introducción

Lanzas una función de IA el martes. El viernes, tu CFO pregunta por qué el gasto de OpenAI subió 40%. Abres el panel de OpenAI. Ves que el gasto total sube, pero no sabes qué función, cliente o ruta lo causó.

Esa es la brecha que encuentran los equipos que ejecutan LLMs en producción. La interfaz de facturación de OpenAI sirve para cuentas por pagar, no para atribución técnica. Ves totales diarios y desglose por modelo, pero no ves la solicitud, el cliente, la ruta ni la función que activó el gasto.

La solución es directa:

envolver cada llamada a OpenAI;
añadir metadatos obligatorios;
registrar cada solicitud en un formato estructurado;
calcular el costo al momento de escribir el evento;
agregar por etiquetas;
alertar sobre desviaciones.

Para el contexto de precios que alimenta el cálculo de costos, consulta el desglose de precios de GPT-5.5. Para un problema relacionado de atribución de facturación en herramientas para desarrolladores, consulta la facturación de uso de GitHub Copilot para equipos de API. Para los conceptos básicos, revisa la referencia oficial de la API de OpenAI.

Por qué el panel de facturación de OpenAI no es suficiente

El panel de OpenAI muestra gasto diario, desglose por modelo y límite de uso. Eso funciona si tienes una sola app, un solo cliente y una sola función.

Deja de ser suficiente cuando tienes:

múltiples funciones;
múltiples clientes;
múltiples entornos;
múltiples desarrolladores;
jobs en background;
workers o colas;
distintos productos usando la misma organización.

Lo que falta:

Gasto total sin contexto

El panel puede decir que ayer gastaste $312 en GPT-5.5. No te dice si vino de un cliente usando el chat de soporte 50,000 veces o de un job nocturno que resumió toda tu base de conocimiento por error.

Sin desglose por función

OpenAI etiqueta por clave de API y modelo. No etiqueta por función, ruta, cliente ni entorno. Si necesitas esas dimensiones, debes capturarlas en tu aplicación.

Retraso en los informes

Los datos de uso pueden tardar decenas de minutos o algunas horas. Para cuando un bucle descontrolado aparece en el panel, ya consumió presupuesto. Para alertas operativas necesitas datos propios casi en tiempo real.

Sin alertas por función

OpenAI ofrece límites y notificaciones generales. No hay una alerta nativa del tipo:

“avísame si /api/v1/chat/answer supera $50 en una hora”.

Eso debes construirlo con tus propios logs y consultas.

Sin atribución por cliente

Si vendes SaaS B2B con funciones de IA, necesitas responder:

“¿cuánto me cuesta el cliente X este mes?”

Sin ese dato, no puedes calcular margen bruto por cliente ni decidir cuotas, precios o upsells.

Las claves por proyecto ayudan, pero no resuelven todo

Las claves de proyecto permiten separar uso por proyecto, pero no por función, cliente o ruta. La API de uso de OpenAI devuelve datos agregados por proyecto, no por solicitud.

El patrón es claro: el panel nativo responde una pregunta financiera. Tú necesitas responder una pregunta de producto.

Modelo de datos para atribución de costos

Cada solicitud a OpenAI debe generar un evento etiquetado. Ese evento es tu unidad de análisis.

Esquema mínimo recomendado:

Columna	Tipo	Ejemplo	Por qué importa
`request_id`	uuid	`7a91...`	Idempotencia, deduplicación, reintentos
`timestamp`	timestamptz	`2026-05-06T14:23:01Z`	Series temporales y anomalías
`feature`	text	`soporte-chat`	Función del producto
`route`	text	`/api/v1/chat/answer`	Ruta HTTP o job
`customer_id`	text	`cliente_4291`	Gasto por cliente
`environment`	text	`prod`, `staging`, `dev`	Separar costo interno de costo de cliente
`model`	text	`gpt-5.5`, `gpt-5.4-mini`	El precio depende del modelo
`prompt_tokens`	int	`15234`	Tokens de entrada
`completion_tokens`	int	`812`	Tokens de salida
`reasoning_tokens`	int	`4500`	Tokens de razonamiento facturados como salida
`cached_tokens`	int	`12000`	Tokens cacheados
`latency_ms`	int	`2341`	Correlación costo/experiencia
`cost_usd`	numeric(10,6)	`0.045672`	Costo calculado
`prompt_cache_key`	text	`sistema-v3`	Seguimiento de caché
`error_code`	text	`null`, `429`	Evitar doble conteo en reintentos

Calcula el costo cuando escribes el evento, no al consultar. Los precios pueden cambiar; el evento debe conservar la tarifa aplicada cuando ocurrió la solicitud.

Ejemplo en Python:

PRICING = {  # USD por 1M de tokens, a partir de mayo de 2026
    "gpt-5.5":      {"input": 5.00,  "cached": 2.50,  "output": 30.00},
    "gpt-5.5-pro":  {"input": 30.00, "cached": 15.00, "output": 180.00},
    "gpt-5.4":      {"input": 2.50,  "cached": 1.25,  "output": 15.00},
    "gpt-5.4-mini": {"input": 0.25,  "cached": 0.125, "output": 2.00},
}

def compute_cost_usd(
    model,
    prompt_tokens,
    cached_tokens,
    completion_tokens,
    reasoning_tokens
):
    rates = PRICING[model]

    uncached = max(0, prompt_tokens - cached_tokens)

    input_cost = (uncached * rates["input"]) / 1_000_000
    cache_cost = (cached_tokens * rates["cached"]) / 1_000_000
    output_cost = (
        (completion_tokens + reasoning_tokens) * rates["output"]
    ) / 1_000_000

    return round(input_cost + cache_cost + output_cost, 6)

Los tokens de razonamiento cuentan como salida. La API los devuelve en:

usage.completion_tokens_details.reasoning_tokens

Súmalos a completion_tokens para calcular el costo. Si no lo haces, subestimarás el gasto en llamadas con razonamiento. Para ver la matemática completa, consulta el desglose de precios de GPT-5.5.

Wrapper de OpenAI con atribución

Todas las llamadas a OpenAI deberían pasar por una única función.

Ejemplo:

import time
import uuid
import json
import logging
from openai import OpenAI

client = OpenAI()
logger = logging.getLogger("llm.cost")

def call_with_attribution(
    *,
    feature,
    route,
    customer_id,
    environment,
    model,
    messages,
    **openai_kwargs
):
    request_id = str(uuid.uuid4())
    started = time.time()
    error_code = None
    response = None

    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            **openai_kwargs
        )
    except Exception as e:
        error_code = getattr(e, "code", "unknown_error")
        raise
    finally:
        latency_ms = int((time.time() - started) * 1000)

        u = response.usage if response else None

        prompt_tokens = getattr(u, "prompt_tokens", 0)
        completion_tokens = getattr(u, "completion_tokens", 0)

        cached_tokens = (
            getattr(
                getattr(u, "prompt_tokens_details", None),
                "cached_tokens",
                0
            ) or 0
        )

        reasoning_tokens = (
            getattr(
                getattr(u, "completion_tokens_details", None),
                "reasoning_tokens",
                0
            ) or 0
        )

        cost_usd = compute_cost_usd(
            model,
            prompt_tokens,
            cached_tokens,
            completion_tokens,
            reasoning_tokens
        )

        logger.info(json.dumps({
            "event": "openai.request",
            "request_id": request_id,
            "feature": feature,
            "route": route,
            "customer_id": customer_id,
            "environment": environment,
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "reasoning_tokens": reasoning_tokens,
            "cached_tokens": cached_tokens,
            "latency_ms": latency_ms,
            "cost_usd": cost_usd,
            "error_code": error_code,
        }))

    return response

Ese wrapper es tu punto de control.

Cada función del producto lo llama. Cada llamada emite una línea JSON. Desde ahí, envía los logs a BigQuery, ClickHouse, Snowflake o Postgres usando tu pipeline existente: Vector, Fluent Bit, Logstash, OTLP collector o equivalente.

Para Node.js, el patrón es el mismo:

envolver el SDK;
recibir metadatos obligatorios;
capturar response.usage;
calcular cost_usd;
emitir un evento JSON;
publicarlo en stdout, Kafka, NATS, Pub/Sub o tu sistema de logs.

Implementación paso a paso

1. Reemplaza llamadas directas a OpenAI

Busca en tu base de código:

OpenAI(
client.chat.completions.create

Cada llamada directa debe convertirse en:

call_with_attribution(
    feature="soporte-chat",
    route="/api/v1/chat/answer",
    customer_id=customer.id,
    environment="prod",
    model="gpt-5.5",
    messages=messages,
)

Haz que feature, route, customer_id y environment sean obligatorios. No uses unknown como valor por defecto. Si falta un campo, lanza error.

2. Emite logs estructurados

Registra una línea JSON por solicitud:

{
  "event": "openai.request",
  "feature": "soporte-chat",
  "route": "/api/v1/chat/answer",
  "customer_id": "cliente_4291",
  "environment": "prod",
  "model": "gpt-5.5",
  "prompt_tokens": 15234,
  "completion_tokens": 812,
  "reasoning_tokens": 4500,
  "cached_tokens": 12000,
  "latency_ms": 2341,
  "cost_usd": 0.045672
}

Usa INFO para estos eventos y evita mezclarlos con logs de depuración.

3. Agrega por función en tu almacén de datos

Ejemplo de consulta:

SELECT
  feature,
  DATE_TRUNC(timestamp, DAY) AS day,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  SUM(prompt_tokens + completion_tokens) AS tokens,
  AVG(latency_ms) AS avg_latency_ms,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY feature, day
ORDER BY day DESC, spend_usd DESC;

Con esto puedes ver qué función consume más presupuesto y cómo evoluciona en el tiempo.

4. Grafica gasto por ruta y cliente

Conecta Grafana, Metabase, Looker o Superset a la tabla y crea tres vistas:

gasto por función en el tiempo;
gasto por cliente en el tiempo;
top 20 rutas por gasto de ayer.

Ese panel debería ser parte de tus operaciones diarias.

5. Prueba el wrapper con Apidog

Antes de desplegar, valida que el wrapper produce eventos correctos. Si el esquema está mal, tus dashboards mentirán.

Usa Apidog para probar el flujo extremo a extremo:

Crea un escenario que llame a tu endpoint de IA con customer_id y feature conocidos.
Captura la respuesta y el log emitido por stdout, OTLP o tu endpoint de logs.
Agrega aserciones para verificar que el log incluye:
- feature;
- route;
- customer_id;
- cost_usd > 0;
- prompt_tokens > 0.
Ejecuta el mismo escenario en staging y producción usando variables de entorno.
Reproduce solicitudes etiquetadas y valida que los reintentos no duplican costo.

Para enfoques más amplios de pruebas de API, consulta herramientas de prueba de API para ingenieros de control de calidad. Para combinar esto con un enfoque contract-first, revisa desarrollo de API "contract-first".

6. Configura límites de presupuesto y alertas

Crea claves de proyecto separadas por entorno o función:

prod-support-chat
prod-summarization
staging-all

Configura límites estrictos en OpenAI para que una función no pueda agotar todo el presupuesto de la organización.

Después, añade alertas propias desde tu almacén de datos. Por ejemplo:

alertar si una función supera 3 veces su gasto horario promedio móvil de 7 días.

Puedes enviar la alerta a PagerDuty, Opsgenie o Slack. El disparador debe venir de tus datos, no del panel de OpenAI.

Técnicas avanzadas

Caché de prompts

GPT-5.5 cobra el 50% de la tarifa de entrada por tokens cacheados. Para aprovecharlo:

mantén el prompt del sistema estable;
coloca variables por solicitud al final;
rastrea cache_hit_rate por función;
alerta si una modificación de prompt reduce la tasa de acierto.

La documentación oficial de caché de prompts de OpenAI explica las reglas de elegibilidad.

API por lotes para trabajo offline

Todo lo que no necesita respuesta síncrona debería pasar por la API por lotes:

resúmenes nocturnos;
evaluaciones;
reprocesamiento de documentos;
backfills;
embeddings offline.

Etiqueta esos eventos con batch_job_id para atribuirlos a la carga de trabajo original.

Ajuste del esfuerzo de razonamiento

GPT-5.5 Thinking usa niveles de reasoning.effort. Más esfuerzo implica más tokens de salida.

Audita tus funciones:

¿usas medium donde low sería suficiente?
¿la calidad mejora lo suficiente para justificar el costo?
¿puedes hacer A/B testing entre niveles?

Para más detalles, consulta cómo usar la API de GPT-5.5.

Disciplina en la ventana de contexto

Los prompts largos son caros. RAG con un presupuesto de recuperación ajustado suele ser mejor que meter toda la base de conocimiento en el contexto.

Rastrea:

AVG(prompt_tokens) BY feature

Si sube semana tras semana sin cambios funcionales, tu prompt se está inflando.

Cuidado con el umbral de 272K tokens

OpenAI aplica un multiplicador de entrada de 2x y un multiplicador de salida de 1.5x en solicitudes que superan los 272K tokens.

Agrega una guarda:

if prompt_tokens > 250_000:
    logger.warning("Prompt cerca del umbral de 272K tokens")

Para detalles de precios, consulta la publicación sobre precios de GPT-5.5.

Límites de gasto por cliente

Si vendes B2B, necesitas cuotas por cliente.

Flujo recomendado:

calcula gasto mensual por customer_id;
consulta ese gasto antes de cada llamada;
si supera la cuota, devuelve 429;
incluye un mensaje claro de límite mensual;
ofrece una acción de upgrade o contacto con ventas.

Ejemplo:

{
  "error": "ai_quota_exceeded",
  "message": "Cuota mensual de IA excedida"
}

Esto convierte las funciones de IA de un riesgo de margen en una unidad de producto controlable.

Errores comunes

Evita estos patrones:

contar tokens de razonamiento como entrada;
confiar en el panel de OpenAI para alertas en tiempo real;
etiquetar a nivel global del SDK en lugar del sitio de llamada;
olvidar cron jobs, workers y webhooks;
muestrear solicitudes;
permitir customer_id = null;
calcular costos con una tabla de precios desactualizada;
no deduplicar reintentos;
mezclar logs de costo con logs de debug.

Para jobs internos, usa rutas sintéticas:

cron:nightly-summarize
queue:image-caption
worker:document-indexing

Alternativas y herramientas

No siempre necesitas construir todo desde cero.

Enfoque	Lo que hace bien	Lo que cuesta	Cuándo usar
API de uso de OpenAI	Nativa, sin configuración, precisa para conciliación	Gratis	Un proyecto, una función, sin atribución por cliente
Helicone	Proxy fácil, dashboards, caché, costos por usuario	Nivel gratuito; pago desde $20/mes	Quieres un panel alojado rápido
Langfuse	Código abierto, trazas + costo, autoalojado o cloud	Autoalojado gratis; cloud desde $29/mes	Quieres observabilidad open source
LangSmith	Integración con LangChain, evaluación + costo	Pago desde $39/usuario/mes	Ya usas LangChain
Almacén de datos propio	Control total, sin proxy, dimensiones personalizadas	Tiempo de ingeniería	Cargas grandes, compliance o residencia de datos

Consideraciones:

Un proxy añade un salto en la ruta crítica.
Un stack autoalojado te da control, pero debes operarlo.
Un almacén de datos propio se integra mejor con tu stack, pero tú mantienes consultas y alertas.
La API de uso nativa sirve para conciliación, no para atribución granular.

Para más contexto, la guía de Helicone sobre seguimiento de costos de LLM explica el enfoque basado en proxy. La documentación de Langfuse sobre seguimiento de costos cubre la ruta open source.

Si operas esto a escala de plataforma, revisa plataformas de API para arquitectura de microservicios.

Casos de uso reales

SaaS B2B con gasto por cliente

Una empresa vende inteligencia de ventas. Cada cliente activa llamadas a GPT-5.5 al generar informes.

Sin atribución, solo sabe que gasta $80,000 al mes en OpenAI.

Con atribución por cliente, descubre que el 12% de los clientes genera el 71% del gasto. Con esos datos puede introducir:

precios escalonados;
cuotas suaves en planes bajos;
cargos por exceso;
upsells basados en uso real.

Herramientas internas para desarrolladores

Una organización da a cada desarrollador acceso a un asistente privado con GPT-5.5.

Usando customer_id = dev_email, plataforma detecta que tres desarrolladores concentran 50% del gasto. Dos tenían agentes automatizados corriendo en bucle. Desactivarlos ahorra $1,800 al mes.

El tercero sí tenía uso legítimo, así que recibe una cuota mayor basada en datos.

Forecast de nuevas funciones de IA

Un equipo de producto quiere lanzar una función de resumen. Para estimar costo, usa datos históricos:

tokens promedio de prompt;
tokens promedio de salida;
llamadas esperadas por usuario activo;
usuarios activos esperados.

Resultado:

$0.04 por usuario activo por día
$1.20 por usuario activo por mes

Con esa información, el equipo puede fijar el precio de la función en $5 por usuario al mes y justificar el margen.

Conclusión

No puedes gestionar lo que no puedes medir. El panel de facturación de OpenAI responde una pregunta financiera. La atribución por función, cliente y ruta responde la pregunta de producto.

Implementa el flujo así:

etiqueta cada solicitud;
calcula costo al escribir el evento;
usa claves separadas por entorno o función;
configura límites nativos en OpenAI;
agrega alertas desde tu almacén de datos;
valida el wrapper con Apidog;
audita prompts, caché y esfuerzo de razonamiento periódicamente.

Descarga Apidog y úsalo para verificar tu wrapper de atribución de costos de extremo a extremo. Envía solicitudes etiquetadas, valida la carga útil del log y reproduce escenarios en distintos entornos antes de confiar en los dashboards.

Para lecturas relacionadas, consulta el desglose de precios de GPT-5.5 y la facturación de uso de GitHub Copilot para equipos de API.

Preguntas frecuentes

¿Los tokens de razonamiento cuentan como entrada o salida?

Como salida. La API los devuelve en:

usage.completion_tokens_details.reasoning_tokens

Súmalos a completion_tokens cuando calcules el costo. Para multiplicadores y precios, consulta el desglose de precios de GPT-5.5.

¿Qué tan preciso es `response.usage` frente al panel de OpenAI?

Los recuentos de tokens en response.usage coinciden con el panel. La desviación aparece si calculas costos con una tabla de tarifas desactualizada. Versiona tu tabla de precios y actualízala cuando OpenAI cambie tarifas.

¿Puedo hacer atribución solo con claves de proyecto?

Solo parcialmente. Las claves de proyecto dan una dimensión: proyecto. No dan función, cliente ni ruta. Úsalas para separar entornos y establecer límites; usa metadatos de aplicación para la atribución granular.

¿Qué pasa con reintentos y errores de rate limit?

Si una solicitud falla antes de ejecutar el modelo, normalmente no hay usage y no se registra costo. Si la solicitud sí se ejecuta y luego tu aplicación reintenta, puedes duplicar el costo si no deduplicas.

Usa el mismo request_id en reintentos idempotentes y deduplica al escribir.

¿Qué tan rápido devuelve datos la API de uso de OpenAI?

Tiene retraso de decenas de minutos. Úsala para conciliación mensual. Para alertas o interruptores de emergencia, usa tus propios eventos.

¿Debo muestrear solicitudes para reducir logs?

No. El volumen es pequeño: una línea JSON por solicitud. El muestreo rompe la atribución por cliente y ruta.

¿Funciona con otros proveedores de LLM?

Sí. Agrega una columna:

provider

Ejemplos:

openai
anthropic
google
deepseek

El wrapper cambia por proveedor, pero el almacén de datos y los dashboards pueden mantenerse. Para comparar precios, consulta precios de la API de DeepSeek V4.

¿Funciona para embeddings e imágenes?

Sí, pero la fórmula cambia.

Embeddings: costo por token de entrada.
Imágenes: costo por imagen y resolución.

Agrega una columna endpoint:

chat
embeddings
image

Después ramifica el cálculo de costos según el endpoint.

OpenAI API 기능별 사용량 추적: 비용 귀속 가이드

Rihpig — Tue, 12 May 2026 02:38:49 +0000

OpenAI 인보이스는 지난달 $4,237를 썼다고 알려줍니다. 하지만 그중 $3,100가 폭주한 요약 엔드포인트에서 발생했고, $700는 월 $50를 내는 고객에게서, $437는 아무도 쓰지 않는 기능에서 발생했다는 사실은 알려주지 않습니다. 기본 대시보드만으로는 가격 책정, 용량 계획, 로드맵 결정을 할 수 없습니다.

지금 Apidog 사용해 보기

이 글에서는 OpenAI API 비용을 기능, 경로, 고객 단위로 귀속하는 구현 방법을 다룹니다. 핵심은 모든 요청에 메타데이터를 붙이고, 토큰 수와 비용을 구조화된 로그로 남기고, 데이터 웨어하우스에서 집계한 뒤, 키별 예산 상한과 알림을 설정하는 것입니다.

💡 Apidog는 비용 추적 래퍼를 프로덕션에 배포하기 전에 요청 수준 가시성과 시나리오 테스트를 제공합니다. 태그가 지정된 요청을 재생하고, 로그 형태를 확인하며, 모든 호출이 데이터 웨어하우스가 기대하는 메타데이터를 포함하는지 검증하는 데 사용할 수 있습니다.

요약 (TL;DR)

구현해야 할 것은 단순합니다.

모든 OpenAI API 호출을 하나의 래퍼 함수로 통과시킵니다.
각 호출에 feature, route, customer_id, environment를 필수로 붙입니다.
response.usage에서 토큰 수를 읽고, 쓰기 시점에 cost_usd를 계산합니다.
요청당 JSON 로그 한 줄을 남깁니다.
BigQuery, ClickHouse, Snowflake, Postgres 같은 웨어하우스에서 집계합니다.
OpenAI 프로젝트 키별 상한과 자체 알림을 함께 둡니다.
Apidog 시나리오 테스트로 래퍼와 로그 스키마를 검증합니다.

왜 OpenAI 청구 대시보드만으로는 부족한가

OpenAI 청구 페이지는 일일 지출, 모델별 사용량, 조직 수준 제한을 보여줍니다. 애플리케이션 하나, 고객 하나, 기능 하나만 있다면 충분할 수 있습니다.

하지만 실제 프로덕션에서는 보통 다음 질문에 답해야 합니다.

어떤 기능이 비용을 만들었는가?
어떤 고객이 가장 많은 비용을 발생시키는가?
어떤 API 경로가 폭주하고 있는가?
스테이징 비용과 프로덕션 비용은 분리되어 있는가?
특정 기능의 시간당 지출이 평소보다 급증했는가?

기본 대시보드는 이 질문에 답하지 못합니다.

기본 대시보드의 한계

컨텍스트 없는 총액

대시보드는 어제 GPT-5.5에 $312를 썼다고 알려줄 수 있습니다. 하지만 이것이 고객 한 명의 지원 채팅 호출 때문인지, 잘못된 배치 작업이 전체 지식 기반을 다시 요약했기 때문인지는 알려주지 않습니다.

기능별 분석 없음

OpenAI는 모델과 키 기준 사용량은 보여주지만, 제품 기능, HTTP 경로, 고객 ID, 환경 같은 애플리케이션 수준 차원은 제공하지 않습니다.

보고 지연

사용량 데이터는 수십 분에서 몇 시간 지연될 수 있습니다. 폭주 루프를 대시보드에서 확인했을 때는 이미 비용이 발생한 뒤입니다.

세밀한 알림 부족

조직 단위 예산 상한과 이메일 알림은 가능하지만, “지원 채팅 엔드포인트가 한 시간에 $50를 넘으면 Slack으로 알림” 같은 조건은 직접 만들어야 합니다.

고객별 귀속 없음

B2B SaaS에서 AI 기능을 제공한다면 고객별 LLM 원가를 알아야 합니다. 그래야 가격 책정, 사용량 제한, 상향 판매, 총마진 계산이 가능합니다.

프로젝트 키만으로는 부족함

OpenAI 프로젝트 키는 프로젝트별 분리를 제공합니다. 하지만 기능별, 고객별, 경로별 귀속은 여전히 애플리케이션에서 직접 처리해야 합니다. OpenAI 사용량 API도 요청 단위가 아니라 집계 데이터를 반환합니다.

이 문제는 LLM 기능을 운영하는 대부분의 팀에서 반복됩니다. Dev.to의 “OpenAI는 당신이 얼마를 썼는지 알려줍니다. 어디에 썼는지는 아닙니다. 그래서 대시보드를 만들었습니다”라는 주제가 공감을 얻은 이유도 여기에 있습니다.

비용 계산에 필요한 가격 맥락은 GPT-5.5 가격 분석을 참고하십시오. 개발자 도구 측면의 관련 문제는 API 팀을 위한 GitHub Copilot 사용량 청구를 참고할 수 있습니다. OpenAI API 기본 사항은 공식 OpenAI API 참조를 확인하십시오.

비용 귀속 데이터 모델 설계

비용 귀속의 기본 단위는 “OpenAI 요청 1건”입니다.

모든 요청은 다음 정보를 가진 이벤트로 기록되어야 합니다.

열	유형	예시	용도
`request_id`	uuid	`7a91...`	멱등성, 중복 제거, 재시도 추적
`timestamp`	timestamptz	`2026-05-06T14:23:01Z`	시계열 분석, 이상 감지
`feature`	text	`support-chat`	호출을 발생시킨 제품 기능
`route`	text	`/api/v1/chat/answer`	HTTP 경로 또는 백그라운드 작업 ID
`customer_id`	text	`cust_4291`	고객별 지출, 총마진 계산
`environment`	text	`prod`, `staging`, `dev`	개발/운영 비용 분리
`model`	text	`gpt-5.5`, `gpt-5.4-mini`	모델별 가격 적용
`prompt_tokens`	int	`15234`	입력 토큰
`completion_tokens`	int	`812`	출력 토큰
`reasoning_tokens`	int	`4500`	추론 토큰, 출력 요금으로 계산
`cached_tokens`	int	`12000`	프롬프트 캐시 적중 토큰
`latency_ms`	int	`2341`	비용과 성능 상관관계 분석
`cost_usd`	numeric	`0.045672`	쓰기 시점 계산 비용
`prompt_cache_key`	text	`system-v3`	캐시 적중률 추적
`error_code`	text	`null`, `429`	실패/재시도 분석

중요한 원칙은 쿼리 시점이 아니라 쓰기 시점에 비용을 계산하는 것입니다. 가격은 바뀔 수 있습니다. 과거 이벤트는 해당 요청이 발생한 날의 요율로 고정되어야 합니다.

비용 계산 함수 만들기

다음은 GPT-5.5 계열 가격을 기준으로 한 예시입니다.

PRICING = {  # USD per 1M tokens, as of May 2026
    "gpt-5.5":      {"input": 5.00,  "cached": 2.50,  "output": 30.00},
    "gpt-5.5-pro":  {"input": 30.00, "cached": 15.00, "output": 180.00},
    "gpt-5.4":      {"input": 2.50,  "cached": 1.25, "output": 15.00},
    "gpt-5.4-mini": {"input": 0.25,  "cached": 0.125, "output": 2.00},
}

def compute_cost_usd(
    model,
    prompt_tokens,
    cached_tokens,
    completion_tokens,
    reasoning_tokens
):
    rates = PRICING[model]

    uncached = max(0, prompt_tokens - cached_tokens)

    input_cost = (uncached * rates["input"]) / 1_000_000
    cache_cost = (cached_tokens * rates["cached"]) / 1_000_000
    output_cost = (
        (completion_tokens + reasoning_tokens) * rates["output"]
    ) / 1_000_000

    return round(input_cost + cache_cost + output_cost, 6)

주의할 점은 reasoning_tokens입니다. OpenAI API는 이를 usage.completion_tokens_details.reasoning_tokens에 반환하지만, 요금은 출력 토큰 기준으로 계산됩니다. 이를 입력으로 처리하면 Thinking 모드 호출 비용을 잘못 계산하게 됩니다.

전체 가격 맥락은 GPT-5.5 가격 분석을 참고하십시오.

OpenAI 클라이언트 래퍼 구현

이제 모든 OpenAI 호출이 하나의 함수만 통과하도록 만듭니다.

import time
import uuid
import json
import logging
from openai import OpenAI

client = OpenAI()
logger = logging.getLogger("llm.cost")

def call_with_attribution(
    *,
    feature,
    route,
    customer_id,
    environment,
    model,
    messages,
    **openai_kwargs
):
    request_id = str(uuid.uuid4())
    started = time.time()
    error_code = None
    response = None

    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            **openai_kwargs
        )
        return response

    except Exception as e:
        error_code = getattr(e, "code", "unknown_error")
        raise

    finally:
        latency_ms = int((time.time() - started) * 1000)

        u = response.usage if response else None

        prompt_tokens = getattr(u, "prompt_tokens", 0)
        completion_tokens = getattr(u, "completion_tokens", 0)

        cached_tokens = (
            getattr(
                getattr(u, "prompt_tokens_details", None),
                "cached_tokens",
                0
            ) or 0
        )

        reasoning_tokens = (
            getattr(
                getattr(u, "completion_tokens_details", None),
                "reasoning_tokens",
                0
            ) or 0
        )

        cost_usd = compute_cost_usd(
            model,
            prompt_tokens,
            cached_tokens,
            completion_tokens,
            reasoning_tokens
        )

        logger.info(json.dumps({
            "event": "openai.request",
            "request_id": request_id,
            "feature": feature,
            "route": route,
            "customer_id": customer_id,
            "environment": environment,
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "reasoning_tokens": reasoning_tokens,
            "cached_tokens": cached_tokens,
            "latency_ms": latency_ms,
            "cost_usd": cost_usd,
            "error_code": error_code,
        }))

이 래퍼가 비용 귀속의 단일 진입점입니다.

코드베이스에서 다음 패턴을 검색해 모두 교체합니다.

OpenAI(
client.chat.completions.create

모든 호출은 다음처럼 명시적으로 태그를 전달해야 합니다.

response = call_with_attribution(
    feature="support-chat",
    route="/api/v1/chat/answer",
    customer_id=current_user.customer_id,
    environment="prod",
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_question},
    ],
)

feature, route, customer_id, environment는 기본값을 두지 않는 것이 좋습니다. 누락되면 "unknown"으로 기록하지 말고 오류를 발생시키십시오. "unknown"은 나중에 귀속 블랙홀이 됩니다.

Node.js에서도 구조는 같습니다.

OpenAI SDK를 직접 호출하지 않습니다.
래퍼 함수가 메타데이터를 받습니다.
response.usage를 읽습니다.
비용을 계산합니다.
JSON 이벤트를 stdout, Kafka, NATS, Pub/Sub, OTLP 중 하나로 보냅니다.

구조화 로그를 데이터 웨어하우스로 보내기

요청당 JSON 한 줄이면 충분합니다.

예시 로그:

{
  "event": "openai.request",
  "request_id": "7a91d2d3-1d7f-4a21-91b5-f29b7f12a111",
  "feature": "support-chat",
  "route": "/api/v1/chat/answer",
  "customer_id": "cust_4291",
  "environment": "prod",
  "model": "gpt-5.5",
  "prompt_tokens": 15234,
  "completion_tokens": 812,
  "reasoning_tokens": 4500,
  "cached_tokens": 12000,
  "latency_ms": 2341,
  "cost_usd": 0.045672,
  "error_code": null
}

기존 로그 파이프라인을 재사용하십시오.

Vector
Fluent Bit
Logstash
OpenTelemetry Collector
Cloud Logging
Datadog
Kafka 기반 이벤트 파이프라인

대상은 BigQuery, ClickHouse, Snowflake, Postgres 등 무엇이든 됩니다. 별도 서비스가 꼭 필요한 것은 아닙니다.

기능별 비용 집계 쿼리

이벤트가 웨어하우스에 들어오면, 대시보드는 SQL 문제로 바뀝니다.

SELECT
  feature,
  DATE_TRUNC(timestamp, DAY) AS day,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  SUM(prompt_tokens + completion_tokens) AS tokens,
  AVG(latency_ms) AS avg_latency_ms,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY feature, day
ORDER BY day DESC, spend_usd DESC;

운영 대시보드에는 최소한 다음 세 가지 뷰를 만드십시오.

시간대별 기능별 지출
시간대별 고객별 지출
어제 지출 기준 상위 20개 경로

Grafana, Metabase, Looker, Superset 등 어떤 BI 도구든 사용할 수 있습니다.

Apidog로 배포 전 검증하기

많은 팀이 래퍼는 만들지만 검증을 건너뜁니다. 그러면 스키마가 조용히 틀어지고, 대시보드는 그럴듯한 거짓말을 보여줍니다.

Apidog로 다음 시나리오를 만드십시오.

알려진 customer_id, feature를 사용해 AI 엔드포인트에 요청을 보냅니다.
응답과 로그 방출을 함께 확인합니다.
로그 페이로드에 다음 필드가 있는지 검증합니다.
- feature
- route
- customer_id
- environment
- model
- prompt_tokens
- completion_tokens
- cost_usd
cost_usd > 0인지 확인합니다.
prompt_tokens > 0인지 확인합니다.
Apidog 환경 변수를 사용해 staging/prod에서 같은 시나리오를 실행합니다.
동일 요청을 재생하여 재시도 시 비용이 중복 집계되지 않는지 확인합니다.

재시도 처리에서는 request_id가 중요합니다. 애플리케이션 레벨에서 같은 작업을 재시도한다면 같은 request_id를 전달하고, 웨어하우스 적재 또는 집계 단계에서 중복 제거해야 합니다.

API 테스트 접근 방식은 QA 엔지니어를 위한 API 테스트 도구를 참고하십시오. 계약 기반 API 개발과 함께 운영하려면 계약 우선 API 개발도 참고할 수 있습니다.

프로젝트 키와 예산 상한 설정

애플리케이션 수준 귀속과 별개로 OpenAI 프로젝트 키는 방어선으로 사용하십시오.

예시:

prod-support-chat
prod-summarization
prod-agent
staging-all
dev-all

각 키에 하드 상한을 설정하면 하나의 기능이 폭주해도 조직 전체 예산을 소진하지 않습니다.

그 위에 자체 알림을 둡니다.

예를 들어 10분마다 다음 로직을 실행합니다.

기능별 최근 1시간 지출 계산
같은 기능의 7일 이동 평균 시간당 지출 계산
현재 지출이 평균의 3배를 넘으면 Slack/PagerDuty/Opsgenie 알림

트리거는 OpenAI 대시보드가 아니라 데이터 웨어하우스에서 나와야 합니다. 그래야 지연을 줄이고 원하는 차원으로 감지할 수 있습니다.

고급 최적화 패턴

프롬프트 캐싱

GPT-5.5는 캐시된 토큰에 대해 입력 요금의 50%를 청구합니다.

캐시 적중률을 높이려면 다음처럼 프롬프트를 구성하십시오.

안정적인 시스템 프롬프트를 앞에 둡니다.
자주 바뀌는 사용자 입력은 뒤에 둡니다.
버전이 바뀌는 프롬프트에는 prompt_cache_key를 명시적으로 기록합니다.

대시보드에서는 기능별 cache_hit_rate를 추적하십시오. 프롬프트 변경 후 캐시 적중률이 떨어지면 입력 비용이 조용히 증가할 수 있습니다.

공식 규칙은 OpenAI 프롬프트 캐싱 문서를 참고하십시오.

배치 API 사용

동기 응답이 필요 없는 작업은 배치 API로 보내십시오.

대표 예시:

야간 요약
평가 실행
임베딩 백필
문서 재처리
대량 분류 작업

배치 작업에도 동일한 귀속 스키마를 적용하십시오. batch_job_id를 추가하면 원래 워크로드와 연결할 수 있습니다.

추론 노력 튜닝

GPT-5.5 Thinking 계열은 reasoning.effort에 따라 출력 토큰이 늘어납니다.

각 기능에 대해 다음을 점검하십시오.

medium이 꼭 필요한가?
low로 품질 기준을 통과하는가?
노력 수준별 비용 대비 품질은 어떤가?

A/B 테스트를 통해 품질과 비용을 함께 기록하고, 품질이 유지된다면 더 낮은 옵션을 배포하십시오. 관련 구현 맥락은 GPT-5.5 API 사용 방법을 참고하십시오.

컨텍스트 창 관리

긴 프롬프트는 비용을 빠르게 증가시킵니다.

RAG를 사용한다면 전체 문서를 컨텍스트 창에 넣지 말고 검색 예산을 제한하십시오.

기능별로 다음 지표를 추적합니다.

SELECT
  feature,
  AVG(prompt_tokens) AS avg_prompt_tokens,
  PERCENTILE_CONT(prompt_tokens, 0.95) AS p95_prompt_tokens
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY avg_prompt_tokens DESC;

기능 변경이 없는데 prompt_tokens가 매주 증가한다면 프롬프트가 비대해지고 있는 것입니다.

GPT-5.5 272K 토큰 절벽 감지

GPT-5.5는 272K 토큰을 초과하는 요청에 대해 입력 2배, 출력 1.5배 승수가 적용됩니다.

래퍼에 다음 가드를 추가하십시오.

if prompt_tokens > 250_000:
    logger.warning(json.dumps({
        "event": "openai.large_context_warning",
        "request_id": request_id,
        "feature": feature,
        "route": route,
        "customer_id": customer_id,
        "prompt_tokens": prompt_tokens,
    }))

가격 세부 정보는 GPT-5.5 가격 책정 게시물을 참고하십시오.

고객별 지출 상한

B2B SaaS에서는 고객별 AI 사용량 제한이 필요합니다.

간단한 패턴은 다음과 같습니다.

고객별 월 누적 cost_usd를 계산합니다.
각 OpenAI 호출 전에 고객의 사용량을 확인합니다.
한도를 넘으면 OpenAI를 호출하지 않고 429를 반환합니다.
응답에는 “월별 AI 할당량 초과” 메시지와 업그레이드 CTA를 포함합니다.

예시:

def ensure_customer_budget(customer_id):
    spend = get_month_to_date_llm_spend(customer_id)
    limit = get_customer_llm_limit(customer_id)

    if spend >= limit:
        raise AiQuotaExceeded(
            f"Monthly AI quota exceeded for customer {customer_id}"
        )

이렇게 해야 AI 기능이 마진 리스크가 아니라 가격 책정 가능한 제품 기능이 됩니다.

피해야 할 실수

추론 토큰을 입력 토큰으로 계산하는 것
실시간 알림을 OpenAI 대시보드에 의존하는 것
SDK 내부에서만 태그를 붙이고 호출 지점의 기능 컨텍스트를 잃는 것
Cron, 큐 워커, 웹훅 같은 백그라운드 작업에 태그를 붙이지 않는 것
요청을 샘플링하는 것
customer_id를 null로 두는 것
실패한 요청과 성공 후 재시도된 요청을 구분하지 않는 것
가격표를 코드에 넣고 업데이트 절차를 만들지 않는 것

백그라운드 작업에는 다음처럼 합성 route를 사용하십시오.

cron:nightly-summarize
queue:image-caption
webhook:crm-sync

customer_id를 모를 경우에도 null 대신 internal, system, unknown_customer처럼 명시적인 값을 사용하십시오.

대안 및 도구 비교

직접 구축하지 않아도 됩니다. 선택지는 다음과 같습니다.

접근 방식	강점	비용	적합한 경우
OpenAI 사용량 API	기본 제공, 설정 불필요, 정확도 높음	무료	프로젝트/기능 수가 적고 고객별 귀속이 필요 없을 때
Helicone	드롭인 프록시, 대시보드, 캐싱, 사용자별 비용	무료 티어, 유료는 월 $20부터	빠르게 호스팅 대시보드가 필요하고 프록시를 허용할 수 있을 때
Langfuse	오픈 소스, 자체 호스팅/클라우드, 추적 + 비용	자체 호스팅 무료, 클라우드 월 $29부터	추적과 비용 관찰성을 한 도구에서 원할 때
LangSmith	LangChain 통합, 평가 + 비용	사용자당 월 $39부터	이미 LangChain을 사용하고 있을 때
맞춤형 데이터 웨어하우스	완전한 제어, 기존 스택과 통합, 프록시 없음	엔지니어링 시간	대규모 워크로드, 맞춤형 차원, 데이터 상주 요건이 있을 때

프록시 기반 도구는 빠르게 시작할 수 있지만 경로에 추가 홉이 생깁니다. 자체 호스팅 관찰성 도구는 제어권이 크지만 운영 부담이 있습니다. 맞춤형 웨어하우스 접근은 초기 구현이 필요하지만 대규모 팀이 결국 선택하는 경우가 많습니다.

프록시 기반 접근은 Helicone 팀의 LLM 비용 추적 가이드를 참고하십시오. 오픈 소스 접근은 Langfuse 비용 추적 문서를 참고할 수 있습니다.

플랫폼 규모에서 이 패턴을 운영한다면 마이크로서비스 아키텍처를 위한 API 플랫폼도 참고하십시오.

실제 사용 사례

고객별 LLM 지출이 필요한 B2B SaaS

한 영업 인텔리전스 제품은 고객의 요약 요청마다 GPT-5.5를 호출합니다.

귀속을 도입하기 전에는 월 OpenAI 비용이 $80,000라는 사실만 알 수 있었습니다. 고객별 귀속을 적용한 뒤에는 고객의 12%가 지출의 71%를 만든다는 것을 확인했습니다.

이후 팀은 다음을 도입했습니다.

계층별 가격
하위 플랜의 소프트 할당량
좌석당 초과 요금

그 결과 AI 기능의 총마진이 한 분기 만에 41%에서 73%로 증가했습니다.

내부 개발자 도구 비용 추적

한 엔지니어링 조직은 모든 개발자에게 개인 GPT-5.5 채팅 도우미를 제공합니다.

customer_id에 개발자 이메일을 기록하자, 세 명의 개발자가 내부 LLM 지출의 50%를 차지한다는 사실이 드러났습니다.

두 명은 끄는 것을 잊은 자동화 에이전트 루프를 실행 중이었고, 이를 중단해 월 $1,800를 절감했습니다. 나머지 한 명은 실제 업무상 높은 사용량이 필요했고, 데이터는 더 높은 할당량을 정당화하는 근거가 되었습니다.

AI 기능 출시 전 비용 예측

제품팀이 새 요약 기능을 출시하려 할 때, 과거 기능별 데이터를 사용해 다음을 예측할 수 있습니다.

호출당 평균 입력 토큰
호출당 평균 출력 토큰
활성 사용자당 예상 호출 수
예상 활성 사용자 수
사용자당 일/월 비용

예측 결과가 활성 사용자당 하루 $0.04, 월 $1.20이라면, 가격 책정팀은 기능을 사용자당 월 $5로 책정할 수 있습니다. 단위 경제학이 보이면 재무 승인도 쉬워집니다.

결론

OpenAI 청구 대시보드는 “얼마를 썼는가”에는 답합니다. 하지만 제품팀과 엔지니어링팀은 “어디에서, 누가, 왜 썼는가”를 알아야 합니다.

구현 체크리스트는 다음과 같습니다.

모든 요청에 feature, route, customer_id, environment를 필수로 붙입니다.
OpenAI 호출을 단일 래퍼로 통과시킵니다.
response.usage에서 토큰 수를 읽고 쓰기 시점에 비용을 계산합니다.
요청당 구조화된 JSON 로그를 남깁니다.
웨어하우스에서 기능별, 고객별, 경로별로 집계합니다.
프로젝트 키별 하드 상한을 설정합니다.
웨어하우스 기반 알림을 추가합니다.
배포 전에 Apidog로 로그 스키마와 시나리오를 검증합니다.
추론 노력, 프롬프트 크기, 캐시 적중률을 정기적으로 감사합니다.

Apidog를 다운로드하여 비용 귀속 래퍼를 엔드투엔드로 검증할 수 있습니다. 태그가 지정된 요청으로 AI 엔드포인트를 호출하고, 로그 페이로드 형태를 확인하며, 여러 환경에서 시나리오를 재생해 데이터 웨어하우스가 신뢰할 수 있는 데이터를 받고 있는지 확인하십시오.

관련 비용 관리 자료는 GPT-5.5 가격 분석과 API 팀을 위한 GitHub Copilot 사용량 청구를 참고하십시오.

자주 묻는 질문 (FAQ)

추론 토큰은 입력인가요, 출력인가요?

출력 요율로 청구됩니다. OpenAI API는 usage.completion_tokens_details.reasoning_tokens에 값을 반환합니다. 비용 계산 시 completion_tokens에 더하십시오. 자세한 내용은 GPT-5.5 가격 분석을 참고하십시오.

`response.usage`는 OpenAI 대시보드와 얼마나 일치하나요?

토큰 수는 대시보드와 토큰 단위로 일치합니다. 다만 오래된 가격표로 비용을 계산하면 가격 변경 때문에 차이가 생길 수 있습니다. 모델별 요율은 고정하고, OpenAI가 가격 변경을 발표하면 버전을 업데이트하십시오.

OpenAI 프로젝트 키만으로 귀속할 수 있나요?

부분적으로만 가능합니다. 프로젝트 키는 프로젝트 단위 귀속을 제공합니다. 기능별, 고객별, 경로별 귀속은 애플리케이션 메타데이터가 필요합니다.

재시도 요청은 비용이 이중 계산되나요?

모델 실행 전에 실패한 요청은 보통 usage를 반환하지 않으므로 비용이 기록되지 않습니다. 하지만 성공 후 애플리케이션 레이어에서 재시도하면 중복 기록될 수 있습니다. 같은 작업의 재시도는 동일한 request_id를 사용하고, 저장 또는 집계 단계에서 중복 제거하십시오.

OpenAI 사용량 API는 실시간인가요?

아니요. 수십 분 정도 지연될 수 있습니다. 실시간 알림, 킬 스위치, 고객별 제한에는 자체 이벤트 로그와 웨어하우스를 사용하십시오. 월별 대사에는 사용량 API가 적합합니다.

로그 볼륨을 줄이기 위해 샘플링해도 되나요?

권장하지 않습니다. 요청당 JSON 한 줄이면 데이터 볼륨은 작습니다. 샘플링하면 고객별, 경로별 귀속 정확도가 깨집니다. 모든 요청을 기록하십시오.

다른 LLM 공급자에도 같은 방식을 쓸 수 있나요?

가능합니다. 스키마에 provider 열을 추가하십시오.

예:

openai
anthropic
google
deepseek

공급자별 래퍼와 가격표만 다르고, 웨어하우스와 대시보드는 재사용할 수 있습니다. 비교 자료로 DeepSeek V4 API 가격을 참고하십시오.

임베딩과 이미지 생성에도 적용되나요?

예. 비용 계산식만 다릅니다.

임베딩: 입력 토큰 기준 과금
이미지 생성: 이미지 수, 해상도, 품질 기준 과금
채팅: 입력/캐시/출력/추론 토큰 기준 과금

스키마에 endpoint를 추가하면 됩니다.

chat
embeddings
image

그리고 endpoint별로 비용 계산 함수를 분기하십시오.

OpenAI API利用料金を機能別に追跡する方法：コスト配分プレイブック

Akira — Tue, 12 May 2026 02:37:48 +0000

OpenAIの請求書には、先月4,237ドル使ったと書かれています。しかし、そのうち3,100ドルは暴走した要約エンドポイントから、700ドルは月に50ドル支払っている顧客から、437ドルは誰も使わない機能から発生したことは書かれていません。ダッシュボードでは、価格設定、キャパシティ、ロードマップの判断に必要な情報が隠れています。

今すぐApidogを試す

このガイドでは、OpenAI APIのコストを機能、ルート、顧客、環境ごとに割り当てる実装方法を説明します。すべてのリクエストにメタデータを付与し、トークン数とコストを構造化ログとして出力し、ウェアハウスで集計し、予算上限とアラートを設定します。

💡 Apidogは、コスト追跡ラッパーを本番環境に出す前に、リクエストレベルの可視性とシナリオテストを提供します。タグ付きリクエストの再生、ログ形式のアサート、すべての呼び出しがウェアハウスの期待するメタデータを持つことの検証に使えます。

TL;DR

OpenAI API呼び出しごとに、以下を必ず記録します。

feature
route
customer_id
environment
model
トークン数
計算済みのcost_usd

そのうえで、ウェアハウスでタグごとに集計し、OpenAI側ではキーごとの予算上限を設定します。さらに、時間ごとの支出異常を検知し、リリース前にApidogのシナリオテストでラッパーを検証します。

はじめに

火曜日に新しいAI機能をリリースしました。金曜日の朝、CFOから「OpenAIの利用料が40%も跳ね上がったのはなぜだ」とDMが来ます。OpenAIダッシュボードを見ると、合計支出が増えていることは分かります。しかし、どの機能、どの顧客、どのルートが原因かは分かりません。

これは、本番環境でLLMワークロードを運用するチームが必ず直面する問題です。OpenAIの請求インターフェースは経理向けであり、エンジニアリングやプロダクトの帰属分析向けではありません。

この記事では、次を実装します。

OpenAIクライアントのラッパー
コスト帰属用のイベントスキーマ
トークン数からのコスト計算
構造化ログ出力
ウェアハウス集計SQL
ApidogによるE2E検証
予算上限と異常検知

価格計算の前提については、GPT-5.5の価格内訳を参照してください。開発者ツール側の請求帰属については、APIチーム向けのGitHub Copilot利用料請求も参考になります。OpenAI APIの基本は公式のOpenAI APIリファレンスを確認してください。

OpenAIの課金ダッシュボードでは不十分な理由

OpenAIの課金ページでは、主に次が確認できます。

日別の支出
モデル別の使用量
組織レベルの使用制限

単一アプリ、単一機能、単一顧客なら十分です。しかし、実際のプロダクトでは複数の機能、顧客、環境、開発者が同じOpenAI組織を使います。

不足する情報は次のとおりです。

コンテキストのない総支出

ダッシュボードに「昨日GPT-5.5に312ドル使った」と表示されても、それがサポートチャットの大量呼び出しなのか、バックグラウンド要約ジョブの暴走なのかは分かりません。

機能ごとの内訳がない

OpenAIはAPIキーやモデル単位では集計できますが、あなたのプロダクト上のfeature、route、customer_id、environmentでは集計しません。

レポートに遅延がある

使用状況データは数十分から数時間遅れて表示されます。暴走ループの検知には遅すぎます。

アラートが粗い

OpenAI側の予算通知だけでは、「チャット機能が1時間に50ドルを超えたら通知する」といった制御はできません。

顧客帰属がない

B2B SaaSでAI機能を提供している場合、顧客ごとのAI原価を把握しないと粗利益を計算できません。

プロジェクトキーだけでは粒度が足りない

OpenAIのプロジェクトキーは有用ですが、機能、顧客、ルート単位の帰属には不十分です。OpenAI usage APIも、基本的には集計済みデータを返すため、リクエスト単位のメタデータはアプリケーション側で持つ必要があります。

この問題は多くのチームに共通しています。Dev.toでも「OpenAIはいくら使ったかは教えてくれる。どこで使ったかは教えてくれない」という文脈で議論されています。

コスト帰属のデータモデル

まず、OpenAIリクエスト1回につき1つのイベントを記録します。このイベントが分析単位です。

最小スキーマは次のとおりです。

カラム	型	例	目的
`request_id`	uuid	`7a91...`	冪等性、重複排除、リトライ
`timestamp`	timestamptz	`2026-05-06T14:23:01Z`	時系列分析、異常検知
`feature`	text	`support-chat`	呼び出し元のプロダクト機能
`route`	text	`/api/v1/chat/answer`	HTTPルートまたはジョブID
`customer_id`	text	`cust_4291`	顧客ごとの支出
`environment`	text	`prod`	本番、ステージング、開発の分離
`model`	text	`gpt-5.5`	モデル別価格計算
`prompt_tokens`	int	`15234`	入力トークン数
`completion_tokens`	int	`812`	出力トークン数
`reasoning_tokens`	int	`4500`	推論トークン
`cached_tokens`	int	`12000`	キャッシュ済み入力トークン
`latency_ms`	int	`2341`	レイテンシ分析
`cost_usd`	numeric	`0.045672`	書き込み時に計算したコスト
`prompt_cache_key`	text	`system-v3`	キャッシュヒット率の追跡
`error_code`	text	`null` / `429`	エラーとリトライ分析

重要なのは、cost_usdをクエリ時ではなく書き込み時に計算することです。価格は変更されるため、履歴イベントには「その時点のレート」で計算した値を固定して保存します。

コスト計算を実装する

GPT-5.5系の価格表をコードに固定します。

PRICING = {  # USD per 1M tokens, as of May 2026
    "gpt-5.5":      {"input": 5.00,  "cached": 2.50,  "output": 30.00},
    "gpt-5.5-pro":  {"input": 30.00, "cached": 15.00, "output": 180.00},
    "gpt-5.4":      {"input": 2.50,  "cached": 1.25, "output": 15.00},
    "gpt-5.4-mini": {"input": 0.25,  "cached": 0.125, "output": 2.00},
}

def compute_cost_usd(
    model,
    prompt_tokens,
    cached_tokens,
    completion_tokens,
    reasoning_tokens
):
    rates = PRICING[model]

    uncached = max(0, prompt_tokens - cached_tokens)

    input_cost = (uncached * rates["input"]) / 1_000_000
    cache_cost = (cached_tokens * rates["cached"]) / 1_000_000
    output_cost = (
        (completion_tokens + reasoning_tokens) * rates["output"]
    ) / 1_000_000

    return round(input_cost + cache_cost + output_cost, 6)

推論トークンは出力として扱います。OpenAI APIではusage.completion_tokens_details.reasoning_tokensとして返されますが、課金上は出力レートです。ここを間違えると、Thinking系の呼び出しコストを過小評価します。

詳細な価格計算はGPT-5.5の価格内訳を参照してください。

OpenAIクライアントをラップする

すべてのOpenAI呼び出しを1つの関数に集約します。

import time
import uuid
import json
import logging
from openai import OpenAI

client = OpenAI()
logger = logging.getLogger("llm.cost")

def call_with_attribution(
    *,
    feature,
    route,
    customer_id,
    environment,
    model,
    messages,
    **openai_kwargs
):
    if not feature or not route or not customer_id or not environment:
        raise ValueError("feature, route, customer_id, environment are required")

    request_id = str(uuid.uuid4())
    started = time.time()
    error_code = None
    response = None

    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            **openai_kwargs
        )
        return response

    except Exception as e:
        error_code = getattr(e, "code", "unknown_error")
        raise

    finally:
        latency_ms = int((time.time() - started) * 1000)

        u = response.usage if response else None

        prompt_tokens = getattr(u, "prompt_tokens", 0)
        completion_tokens = getattr(u, "completion_tokens", 0)

        cached_tokens = (
            getattr(
                getattr(u, "prompt_tokens_details", None),
                "cached_tokens",
                0
            ) or 0
        )

        reasoning_tokens = (
            getattr(
                getattr(u, "completion_tokens_details", None),
                "reasoning_tokens",
                0
            ) or 0
        )

        cost_usd = compute_cost_usd(
            model,
            prompt_tokens,
            cached_tokens,
            completion_tokens,
            reasoning_tokens
        )

        logger.info(json.dumps({
            "event": "openai.request",
            "request_id": request_id,
            "feature": feature,
            "route": route,
            "customer_id": customer_id,
            "environment": environment,
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "reasoning_tokens": reasoning_tokens,
            "cached_tokens": cached_tokens,
            "latency_ms": latency_ms,
            "cost_usd": cost_usd,
            "error_code": error_code,
        }))

このラッパーを、コスト帰属の唯一の入口にします。

やることは明確です。

コードベースでOpenAI(を検索する
client.chat.completions.createの直接呼び出しを禁止する
すべてcall_with_attribution(...)に置き換える
feature、route、customer_id、environmentを必須にする
不明な値をunknownで埋めず、呼び出し時に失敗させる

Node.jsでも構造は同じです。OpenAI SDKを関数で包み、response.usageを読み取り、JSONイベントを書き込みます。Kafka、NATS、Pub/Subなどのイベントバスがある場合は、stdoutではなくそこに発行しても構いません。

コスト追跡を構築し、Apidogでテストする

実装手順は6ステップです。

1. 直接のOpenAI呼び出しをラッパーに置き換える

コードベースで次を検索します。

grep -R "OpenAI(" .
grep -R "chat.completions.create" .

見つかった呼び出しをすべてcall_with_attribution(...)に置き換えます。

呼び出し例：

response = call_with_attribution(
    feature="support-chat",
    route="/api/v1/chat/answer",
    customer_id=current_user.customer_id,
    environment="prod",
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_question},
    ],
)

2. 構造化ログを出力する

各イベントは1行のJSONで出力します。

{
  "event": "openai.request",
  "request_id": "7a91...",
  "feature": "support-chat",
  "route": "/api/v1/chat/answer",
  "customer_id": "cust_4291",
  "environment": "prod",
  "model": "gpt-5.5",
  "prompt_tokens": 15234,
  "completion_tokens": 812,
  "reasoning_tokens": 4500,
  "cached_tokens": 12000,
  "latency_ms": 2341,
  "cost_usd": 0.045672,
  "error_code": null
}

このログを既存のパイプラインでBigQuery、ClickHouse、Snowflake、Postgresなどに送ります。

3. ウェアハウスで機能ごとに集計する

SELECT
  feature,
  DATE_TRUNC(timestamp, DAY) AS day,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  SUM(prompt_tokens + completion_tokens) AS tokens,
  AVG(latency_ms) AS avg_latency_ms,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY feature, day
ORDER BY day DESC, spend_usd DESC;

次のビューを作ると運用しやすくなります。

機能ごとの日次支出
顧客ごとの日次支出
ルート別の上位支出
モデル別の支出
キャッシュヒット率
平均プロンプトトークン数
平均出力トークン数

4. ルートごとの支出をグラフ化する

Grafana、Metabase、Looker、Supersetなどで可視化します。

最低限、次の3つは作ってください。

機能別支出の時系列
顧客別支出の時系列
昨日の支出が多い上位20ルート

これが、OpenAIダッシュボードの代わりに毎日見る運用ダッシュボードになります。

5. リリース前にApidogでラッパーをテストする

ラッパーのバグは静かにダッシュボードを壊します。特に危険なのは、ログが出ているように見えて、customer_idやfeatureが欠落している状態です。

Apidogで次を検証します。

既知のcustomer_idとfeatureを持つリクエストをAIエンドポイントに送る
レスポンスを検証する
stdout、OTLP、ログエンドポイントなどのサイドチャネルを確認する
ログペイロードにfeature、route、customer_idが含まれることをアサートする
cost_usd > 0とprompt_tokens > 0をアサートする
ステージングと本番で同じシナリオを実行する
リトライ時にコストが二重計上されないことを確認する

APIテスト全般については、QAエンジニア向けのAPIテストツールを参照してください。契約優先でAPIを設計する場合は、契約優先API開発も参考になります。

6. キーごとの予算上限とアラートを設定する

OpenAI側では、環境や主要機能ごとにプロジェクトキーを分けます。

例：

prod-support-chat
prod-summarization
prod-agent
staging-all

それぞれにOpenAIダッシュボードで上限を設定します。

ただし、ネイティブの上限だけでは不十分です。ウェアハウス側でも異常検知します。

例：10分ごとに実行する監視SQL

WITH hourly AS (
  SELECT
    feature,
    TIMESTAMP_TRUNC(timestamp, HOUR) AS hour,
    SUM(cost_usd) AS spend_usd
  FROM openai_events
  WHERE environment = 'prod'
    AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY)
  GROUP BY feature, hour
),
baseline AS (
  SELECT
    feature,
    AVG(spend_usd) AS avg_hourly_spend
  FROM hourly
  WHERE hour < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
  GROUP BY feature
),
current_hour AS (
  SELECT
    feature,
    SUM(cost_usd) AS current_spend
  FROM openai_events
  WHERE environment = 'prod'
    AND timestamp >= TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR)
  GROUP BY feature
)
SELECT
  c.feature,
  c.current_spend,
  b.avg_hourly_spend
FROM current_hour c
JOIN baseline b USING (feature)
WHERE c.current_spend > b.avg_hourly_spend * 3;

結果が返ったらSlack、PagerDuty、Opsgenieなどに通知します。

ネイティブ上限は最後の防衛線、ウェアハウスアラートは早期検知です。

高度なテクニック

プロンプトキャッシングを前提にプロンプトを設計する

GPT-5.5では、キャッシュされたトークンは入力レートの50%で課金されます。システムプロンプトを安定したプレフィックスとして配置し、リクエストごとの変数を末尾に置きます。

追跡すべき指標：

SELECT
  feature,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY cache_hit_rate ASC;

公式のOpenAIプロンプトキャッシングドキュメントも確認してください。

オフライン処理はBatch APIに寄せる

同期応答が不要な処理はBatch APIに回します。

対象例：

夜間要約
評価実行
埋め込みのバックフィル
ドキュメント再処理

Batch呼び出しにも同じコスト帰属を適用し、イベントにbatch_job_idを追加します。

推論努力をチューニングする

GPT-5.5 Thinkingでは、reasoning.effortによって推論トークンが変わります。mediumで動かしている機能が、lowでも品質基準を満たすか確認してください。

やること：

reasoning.effort別にA/Bテストする
品質指標を比較する
cost_usdを比較する
品質が維持される最安設定を採用する

詳細はGPT-5.5 APIの使用方法を参照してください。

コンテキストウィンドウを管理する

プロンプトが長いほどコストは増えます。RAGでは、知識ベース全体を入れるのではなく、取得件数とトークン予算を明示的に制限します。

監視SQL：

SELECT
  feature,
  DATE_TRUNC(timestamp, WEEK) AS week,
  AVG(prompt_tokens) AS avg_prompt_tokens
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature, week
ORDER BY week DESC, avg_prompt_tokens DESC;

機能変更がないのにavg_prompt_tokensが増えている場合、プロンプトが肥大化しています。

GPT-5.5の272Kトークンクリフを監視する

GPT-5.5では、272Kトークンを超えるリクエストに対して、入力に2倍、出力に1.5倍の乗数が適用されます。

ラッパーにガードを追加します。

if prompt_tokens > 250_000:
    logger.warning(json.dumps({
        "event": "openai.prompt_token_warning",
        "request_id": request_id,
        "feature": feature,
        "route": route,
        "customer_id": customer_id,
        "prompt_tokens": prompt_tokens,
    }))

価格の詳細はGPT-5.5の価格に関する投稿を参照してください。

顧客ごとの支出上限を設定する

B2B SaaSでは、顧客ごとのAI原価を制御する必要があります。

実装方針：

ウェアハウスまたは高速ストアでcustomer_idごとの月次支出を集計
各OpenAI呼び出し前に上限をチェック
上限超過時は429を返す
レスポンスに課金CTAを含める

例：

def assert_customer_budget(customer_id):
    spend = get_monthly_ai_spend(customer_id)
    limit = get_customer_ai_limit(customer_id)

    if spend >= limit:
        raise AIQuotaExceeded(
            "月間AIクォータを超過しました。プランのアップグレードを検討してください。"
        )

避けるべきミス

推論トークンを入力として課金する
リアルタイム監視にOpenAIダッシュボードだけを使う
呼び出しサイトではなくSDKレベルで雑にタグ付けする
cron、キューワーカー、Webhookのタグ付けを忘れる
リクエストログをサンプリングする
customer_idをnullのままにする
リトライ時にrequest_idを再利用せず二重計上する

バックグラウンドジョブには、次のような合成routeを付けます。

cron:nightly-summarize
queue:image-caption
webhook:customer-import

customer_idが存在しない内部処理では、nullではなくinternalやsystemを使います。

代替手段とツール

自前実装以外の選択肢もあります。

アプローチ	得意な点	コスト	向いているケース
OpenAI usage API	ネイティブ、セットアップ不要	無料	1プロジェクト、1機能、顧客帰属不要
Helicone	ドロップインプロキシ、ダッシュボード、キャッシュ	無料枠あり、月額20ドル〜	早く可視化したい、プロキシを許容できる
Langfuse	OSS、セルフホスト、トレース + コスト	セルフホスト無料、クラウド月額29ドル〜	トレースとコストを一体で管理したい
LangSmith	LangChain統合、評価 + コスト	月額39ドル/ユーザー〜	LangChainをすでに使っている
カスタムウェアハウス	完全制御、既存スタックに統合	エンジニアリング時間	大規模、独自ディメンション、データ所在地要件あり

プロキシ型のHeliconeは導入が速い一方、クリティカルパスにホップが増えます。Langfuseは制御しやすいですが、セルフホストする場合は運用が必要です。カスタムウェアハウスは実装コストがありますが、大規模チームでは最終的にこの形に寄ることが多いです。

LLMコスト可観測性の実装例として、HeliconeチームのLLMコスト追跡に関するガイドとLangfuseのコスト追跡に関するドキュメントも参考になります。

プラットフォーム規模でこのパターンを運用する場合は、マイクロサービスアーキテクチャのためのAPIプラットフォームも参照してください。

実世界のユースケース

顧客ごとのLLM支出を持つB2B SaaS

あるセールスインテリジェンス製品では、顧客がブリーフィングを要求するたびにGPT-5.5呼び出しが発生します。帰属なしでは、月8万ドルのOpenAI支出しか分かりません。

顧客ごとの帰属を入れると、顧客の12%が支出の71%を占めていることが分かりました。そこで段階的価格、ソフトクォータ、超過料金を導入し、AI機能の粗利益を改善できます。

社内開発ツールの追跡

エンジニア向けの社内GPTアシスタントでも同じです。customer_idに開発者メールを入れると、誰がどれだけ使っているかが分かります。

異常な支出を見つけることで、放置された自動エージェントループを停止できます。一方、正当な高利用者にはより高いクォータを割り当てる判断もできます。

AI機能の支出予測

新しい要約機能を出す前に、過去の機能別データから次を見積もります。

呼び出しあたりの平均入力トークン
呼び出しあたりの平均出力トークン
アクティブユーザーあたりの想定呼び出し回数
想定アクティブユーザー数

これにより、機能単位の原価を事前に計算できます。価格設定やリリース可否の判断が推測ではなくなります。

結論

測定できないものは管理できません。OpenAIの課金ダッシュボードは財務上の合計金額を示しますが、プロダクト運用には機能、顧客、ルートごとの帰属が必要です。

実装すべきことはシンプルです。

すべてのリクエストにfeature、route、customer_id、environmentを付ける
OpenAIクライアントをラッパー経由に統一する
トークン数とcost_usdを構造化ログで出力する
ウェアハウスで集計する
OpenAIプロジェクトキーごとに上限を設定する
ウェアハウス側で異常検知する
リリース前にApidogでラッパーを検証する

Apidogをダウンロードして、コスト帰属ラッパーのE2E検証に使ってください。タグ付きリクエストでAIエンドポイントを実行し、ログペイロードの形状をアサートし、複数環境でシナリオを再生できます。

よくある質問

推論トークンは入力として課金されますか？出力として課金されますか？

出力レートで課金されます。OpenAI APIではusage.completion_tokens_details.reasoning_tokensとして返されるため、completion_tokensに加算してコスト計算してください。詳細はGPT-5.5の価格内訳を参照してください。

`response.usage`はOpenAIダッシュボードと一致しますか？

トークン数はダッシュボードと一致します。ただし、古い料金表でコストを計算していると、価格変更によってずれます。モデルごとのレートはコードまたは設定で固定し、価格変更日に更新してください。

OpenAIのプロジェクトキーだけで帰属できますか？

一部は可能です。プロジェクト単位の帰属や予算上限には有効です。ただし、機能、顧客、ルート単位の帰属にはアプリケーションレベルのメタデータが必要です。

リトライでコストが二重計上されませんか？

モデル実行前に失敗したリクエストは通常usageを返さないため、コストは記録されません。成功後にアプリケーション側でリトライすると、request_idを再利用しない限り二重計上されます。冪等なリトライでは同じrequest_idを使い、書き込み時に重複排除してください。

OpenAI usage APIはリアルタイム監視に使えますか？

リアルタイム監視には不向きです。数十分の遅延があります。アラートやキルスイッチには自分のログとウェアハウスを使い、月次調整にはusage APIを使うのが現実的です。

ログ量を減らすためにサンプリングしてもよいですか？

いいえ。リクエストごとに1行のJSONで済むため、データ量は小さいです。サンプリングすると顧客別、ルート別の正確な帰属が壊れます。すべて記録してください。

他のLLMプロバイダーにも使えますか？

使えます。providerカラムを追加し、openai、anthropic、google、deepseekなどを入れます。プロバイダーごとに料金表とラッパーは変わりますが、ウェアハウスのスキーマとダッシュボードは共通化できます。比較としてDeepSeek V4 APIの価格設定も参照してください。

埋め込みや画像生成にも使えますか？

使えます。ただし、コスト計算はエンドポイントごとに分岐します。埋め込みは入力トークン単位、画像生成は画像枚数や解像度単位で計算します。スキーマにendpointを追加し、chat、embeddings、imageなどで分けてください。

OIDC The Hard way - Mirecloud Home lab Part 3

Stevens Emmanuel Ledoux — Tue, 12 May 2026 02:36:27 +0000

Eliminating password databases: OpenID Connect, front-channel vs. back-channel, role mapping, and the end of local authentication.
Overview

Parts 1 and 2 built the foundation: Vault manages all credentials, External Secrets Operator bridges them into Kubernetes, cert-manager automates TLS, and Keycloak runs as a production-grade identity provider with clustered session state.
Part 3 is where that infrastructure proves its value: integrating Grafana with Keycloak via OpenID Connect to eliminate Grafana's native login form entirely. By the end, there is no Grafana password database. No local admin account. Every login redirects to Keycloak, authenticates against the central identity layer, and maps realm roles to Grafana permissions automatically.
The deliverables:
Understanding the OIDC Authorization Code Flow
Configuring Keycloak as an Identity Provider (IdP)
Configuring Grafana as a Relying Party (RP)
Managing the client secret through Vault and ESO
Front-channel vs. back-channel URL configuration (the detail most guides get wrong)
Role mapping via JMESPath expressions

A Primer on OpenID Connect

Before diving into YAML, it is worth understanding what OpenID Connect actually does - because every configuration decision that follows is a direct consequence of how the protocol works.
The Problem It Solves

Without SSO, every service in your cluster has its own user database, its own password policy, its own session management. Add a user, you add them five times. Rotate a password, you rotate it five times. An employee leaves, you hope you remembered to revoke access in all five places.
OpenID Connect (OIDC) is an identity layer built on top of OAuth 2.0. It defines a standard protocol by which an application (the Relying Party, e.g., Grafana) can delegate authentication to a trusted external service (the Identity Provider, e.g., Keycloak). The application never handles passwords. It only receives a verified identity token.
The Authorization Code Flow

This is the flow used by Grafana when a user attempts to log in:
Step-by-step breakdown:
User navigates to Grafana → GET /
Grafana redirects to Keycloak → 302 with auth_url
Browser follows redirect to Keycloak → GET /auth/realms/mirecloud/protocol/openid-connect/auth
Keycloak renders login form → User sees username/password fields
User submits credentials → POST to Keycloak (Grafana never sees this)
Keycloak redirects back to Grafana → 302 with code=AUTH_CODE
Browser follows redirect to Grafana callback → GET /login/generic_oauth?code=...

From here, the flow switches to back-channel (server-to-server, no browser involved):
Grafana exchanges code for tokens (back-channel) → POST /token
Keycloak returns tokens → { access_token, id_token }
Grafana requests user info (back-channel) → GET /userinfo
Keycloak returns user claims → { sub, email, realm_access.roles }
Grafana creates session → Sets grafana_session cookie

Front-Channel vs. Back-Channel

The diagram reveals a critical distinction that most tutorials ignore:
Front-channel calls travel through the user's browser as HTTP redirects. The auth_url is a front-channel URL - the browser navigates to it directly. It must be publicly reachable: https://keycloak.mirecloud.com/...
Back-channel calls are made directly between Grafana's pod and Keycloak's pod, inside the Kubernetes cluster. The browser is not involved. These are the token exchange ( token_url) and user info ( api_url) calls.
This is why token_url in the Grafana configuration uses the internal Kubernetes service DNS name ( keycloak-keycloakx-http.keycloak.svc.cluster.local) rather than the public hostname.
Key Concepts

Step 1 - Configure Keycloak (One-Time Setup)

Create a Realm

Navigate to the Keycloak admin console → Create Realm.
Create a Client for Grafana

Inside the mirecloud realm, navigate to Clients → Create Client.
General Settings: Capability config:
Client authentication: ON
Authentication flow: Enable "Standard flow"

Navigate to Clients → grafana → Credentials tab. Copy the Client Secret and store it in Vault:
kubectl -n vault exec -ti vault-0 -- vault kv put secret/grafana/sso \ client_secret=''

Step 2 - ExternalSecret for Grafana

apiVersion: external-secrets.io/v1 kind: ExternalSecret metadata: name: grafana-keycloak-es namespace: monitoring spec: refreshInterval: 1m secretStoreRef: name: vault-backend kind: ClusterSecretStore target: name: grafana-keycloak-secret data: - secretKey: client_secret remoteRef: key: secret/grafana/sso property: client_secret

Step 3 - Grafana OIDC Configuration

kube-prometheus-stack: grafana: enabled: true envFromSecret: grafana-keycloak-secret grafana.ini: auth.generic_oauth: enabled: true name: "Keycloak" client_id: "grafana" client_secret: $__env{client_secret} # Front-channel URL (browser navigates here) auth_url: "https://keycloak.mirecloud.com/auth/realms/mirecloud/protocol/openid-connect/auth" # Back-channel URLs (pod-to-pod) token_url: "http://keycloak-keycloakx-http.keycloak.svc.cluster.local:80/auth/realms/mirecloud/protocol/openid-connect/token" api_url: "http://keycloak-keycloakx-http.keycloak.svc.cluster.local:80/auth/realms/mirecloud/protocol/openid-connect/userinfo" scopes: "openid profile email" allow_sign_up: true # Role mapping role_attribute_path: "contains(realm_access.roles[*], 'admin') && 'Admin' || 'Viewer'"

Configuration Breakdown

uses the public DNS name: https://keycloak.mirecloud.com/...
token_url and api_url use internal cluster DNS to avoid DNS hairpin issues in homelab environments.
Test the OIDC Flow

Navigate to https://grafana.mirecloud.com.
Click Sign in with Keycloak.
The browser redirects to Keycloak. Enter credentials for a user in the mirecloud realm.
Grafana exchanges the authorization code for tokens (back-channel, invisible to you) and creates a session.
You land on the Grafana dashboard. Your role (Admin or Viewer) is determined by the admin realm role assignment.
Security Posture

No Grafana password database - all authentication delegated to Keycloak
Client secret managed through Vault and ESO - never visible in Git
OIDC tokens transmitted securely (TLS on front-channel, internal service mesh for back-channel)
Role assignment driven by Keycloak realm roles - access control changes do not require Grafana restarts

What's Next: Part 4

Part 4 will cover GitLab OIDC configuration with discovery: false, explicit OAuth endpoint definition, and CA injection.
The complete repository is available at github.com/mirecloud/home_lab.
Emmanuel Catin - Senior Platform Engineer | Kubernetes, GitOps, Zero Trust
CKA (90%) | CKS in preparation | Montréal, QC

Kubernetes #OIDC #Keycloak #Grafana #SSO #OpenIDConnect #GitOps #Vault #ExternalSecrets #DevSecOps #HomeLab #PlatformEngineering #ZeroTrust

Originally published at https://emmanuel-steven.blogspot.com on February 17, 2026.

How to Track OpenAI API spend per feature: a cost-attribution playbook

Hassann — Tue, 12 May 2026 02:35:48 +0000

Your OpenAI invoice says you spent $4,237 last month. It does not tell you that $3,100 came from one runaway summarization endpoint, $700 came from a customer paying $50/month, and $437 came from a feature nobody uses. If you want pricing, capacity, or roadmap decisions to be grounded in data, you need request-level cost attribution.

Try Apidog today

This guide shows how to implement OpenAI API cost attribution in production: tag every request, log token usage and computed cost, aggregate spend by feature/route/customer, set budget caps, and test the wrapper before shipping.

💡 Apidog gives you the request-level visibility and scenario testing you need to verify your cost-tracking wrapper works before it ships to production. Use Apidog to replay tagged requests, assert log shape, and validate that every call carries the metadata your warehouse expects.

TL;DR

Implement this pipeline:

Wrap every OpenAI API call.
Require metadata: feature, route, customer_id, and environment.
Capture response.usage.
Compute cost_usd at write time.
Emit one structured log event per request.
Aggregate by tag in your warehouse.
Set OpenAI project/key budget caps.
Alert on hourly spend anomalies.
Validate the wrapper with Apidog scenario tests.

Introduction

You ship a new AI feature on Tuesday. By Friday, your CFO asks why the OpenAI line item jumped 40%. The OpenAI dashboard shows total spend and model usage, but not which feature, customer, or endpoint caused the spike.

That is the core problem: OpenAI billing is useful for invoices, not engineering attribution.

The fix is straightforward:

Add metadata at the call site.
Log every request as structured data.
Compute cost from token usage.
Store the event in your warehouse.
Build dashboards and alerts from that table.

By the end of this guide, you will have:

A cost-attribution event schema
Python wrapper code
SQL aggregation queries
A verification workflow with Apidog
A build-vs-buy tooling comparison

For pricing context, see the GPT-5.5 pricing breakdown. For a related billing-attribution problem, see GitHub Copilot usage billing for API teams. For API basics, see the official OpenAI API reference.

Why OpenAI’s billing dashboard is not enough

The OpenAI billing dashboard typically gives you:

Daily spend
Model breakdown
Usage limits

That works for a simple setup. It breaks down when you have:

Multiple AI features
Multiple customers
Multiple environments
Multiple developers
Background jobs
Internal tools

What is missing

Total spend without context

The dashboard can tell you that you spent $312 yesterday. It cannot tell you whether that came from a customer hammering your support-chat endpoint or from a background job reprocessing your knowledge base.

No per-feature breakdown

OpenAI usage is grouped around account/project/model dimensions. It does not know your product concepts: feature, route, customer_id, or environment.

Reporting lag

Usage data may lag by tens of minutes or hours. That is too slow for runaway loops or hourly burn alerts.

No feature-level alerts

There is no native primitive for: “Page me if /api/v1/chat/answer exceeds $50/hour.”

No customer attribution

If you run B2B SaaS, you need to know which customer generated which spend. Without that, you cannot compute gross margin per customer.

Project keys help, but only partially

OpenAI project keys can separate workloads at a coarse level. They do not give you per-feature, per-route, or per-customer attribution. The OpenAI usage API returns aggregated data, not request-level product metadata.

The pattern is common enough that the Dev.to thread “OpenAI Tells You What You Spent. Not Where. So I Built a Dashboard” resonated with developers: you cannot manage what you cannot measure.

The cost-attribution data model

Treat every OpenAI request as a cost event. That event is the unit you query, alert on, and reconcile.

Use a schema like this:

Column	Type	Example	Why it matters
`request_id`	uuid	`7a91...`	Idempotency, deduplication, retries
`timestamp`	timestamptz	`2026-05-06T14:23:01Z`	Time-series queries and anomaly detection
`feature`	text	`support-chat`	Product surface that triggered the call
`route`	text	`/api/v1/chat/answer`	HTTP route or background job ID
`customer_id`	text	`cust_4291`	Per-customer spend and gross margin
`environment`	text	`prod`, `staging`, `dev`	Separate production from internal usage
`model`	text	`gpt-5.5`, `gpt-5.4-mini`	Pricing differs per model
`prompt_tokens`	int	`15234`	Input token count
`completion_tokens`	int	`812`	Output token count
`reasoning_tokens`	int	`4500`	Reasoning tokens billed as output
`cached_tokens`	int	`12000`	Cached input tokens
`latency_ms`	int	`2341`	Cost/performance correlation
`cost_usd`	numeric(10,6)	`0.045672`	Cost computed at write time
`prompt_cache_key`	text	`system-v3`	Cache hit tracking
`error_code`	text	`null`, `429`	Retry and failure analysis

Compute cost when you write the event, not later in a dashboard query. Pricing changes over time, so historical events should preserve the rate used at the time.

Example pricing function:

PRICING = {  # USD per 1M tokens, as of May 2026
    "gpt-5.5":      {"input": 5.00,  "cached": 2.50,  "output": 30.00},
    "gpt-5.5-pro":  {"input": 30.00, "cached": 15.00, "output": 180.00},
    "gpt-5.4":      {"input": 2.50,  "cached": 1.25, "output": 15.00},
    "gpt-5.4-mini": {"input": 0.25,  "cached": 0.125, "output": 2.00},
}

def compute_cost_usd(model, prompt_tokens, cached_tokens, completion_tokens, reasoning_tokens):
    rates = PRICING[model]

    uncached = max(0, prompt_tokens - cached_tokens)

    input_cost = (uncached * rates["input"]) / 1_000_000
    cache_cost = (cached_tokens * rates["cached"]) / 1_000_000
    output_cost = ((completion_tokens + reasoning_tokens) * rates["output"]) / 1_000_000

    return round(input_cost + cache_cost + output_cost, 6)

Reasoning tokens are returned under:

usage.completion_tokens_details.reasoning_tokens

They are billed at the output rate. If you omit them, you undercount cost for reasoning-heavy calls.

For more pricing details, see the GPT-5.5 pricing breakdown.

Wrap the OpenAI client

Every OpenAI call should go through one wrapper. The wrapper should:

Require product metadata.
Generate or receive a request_id.
Call OpenAI.
Capture token usage.
Compute cost.
Emit a structured event.

import time
import uuid
import json
import logging
from openai import OpenAI

client = OpenAI()
logger = logging.getLogger("llm.cost")

def call_with_attribution(
    *,
    feature,
    route,
    customer_id,
    environment,
    model,
    messages,
    request_id=None,
    **openai_kwargs
):
    if not feature or not route or not customer_id or not environment:
        raise ValueError("feature, route, customer_id, and environment are required")

    request_id = request_id or str(uuid.uuid4())
    started = time.time()
    error_code = None
    response = None

    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            **openai_kwargs
        )
        return response

    except Exception as e:
        error_code = getattr(e, "code", "unknown_error")
        raise

    finally:
        latency_ms = int((time.time() - started) * 1000)

        u = response.usage if response else None

        prompt_tokens = getattr(u, "prompt_tokens", 0) if u else 0
        completion_tokens = getattr(u, "completion_tokens", 0) if u else 0

        cached_tokens = (
            getattr(getattr(u, "prompt_tokens_details", None), "cached_tokens", 0)
            if u else 0
        ) or 0

        reasoning_tokens = (
            getattr(getattr(u, "completion_tokens_details", None), "reasoning_tokens", 0)
            if u else 0
        ) or 0

        cost_usd = compute_cost_usd(
            model,
            prompt_tokens,
            cached_tokens,
            completion_tokens,
            reasoning_tokens
        )

        logger.info(json.dumps({
            "event": "openai.request",
            "request_id": request_id,
            "feature": feature,
            "route": route,
            "customer_id": customer_id,
            "environment": environment,
            "model": model,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "reasoning_tokens": reasoning_tokens,
            "cached_tokens": cached_tokens,
            "latency_ms": latency_ms,
            "cost_usd": cost_usd,
            "error_code": error_code,
        }))

Usage example:

response = call_with_attribution(
    feature="support-chat",
    route="/api/v1/chat/answer",
    customer_id="cust_4291",
    environment="prod",
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a support assistant."},
        {"role": "user", "content": "How do I reset my password?"}
    ],
)

Ship these logs to your existing pipeline:

Vector
Fluent Bit
Logstash
OTLP collector
Kafka
Pub/Sub
NATS

Then write them into your warehouse:

BigQuery
ClickHouse
Snowflake
Postgres

For Node.js, use the same shape: a wrapper function around the OpenAI SDK that accepts metadata, captures response.usage, computes cost, and writes a JSON event.

Wire up cost tracking and test it with Apidog

1. Replace direct OpenAI calls

Search your codebase for direct SDK calls:

grep -R "client.chat.completions.create" .
grep -R "OpenAI(" .

Replace every direct call with your attribution wrapper.

Do not default missing metadata to "unknown". Fail fast:

if not feature:
    raise ValueError("feature is required")

Bad tags create silent attribution errors.

2. Emit structured logs

Log one JSON event per request:

{
  "event": "openai.request",
  "request_id": "7a91...",
  "feature": "support-chat",
  "route": "/api/v1/chat/answer",
  "customer_id": "cust_4291",
  "environment": "prod",
  "model": "gpt-5.5",
  "prompt_tokens": 15234,
  "completion_tokens": 812,
  "reasoning_tokens": 4500,
  "cached_tokens": 12000,
  "latency_ms": 2341,
  "cost_usd": 0.045672,
  "error_code": null
}

Keep these events clean. Do not mix them with debug logs.

3. Aggregate spend in SQL

Once events are in your warehouse, start with feature-level spend:

SELECT
  feature,
  DATE_TRUNC(timestamp, DAY) AS day,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  SUM(prompt_tokens + completion_tokens + reasoning_tokens) AS tokens,
  AVG(latency_ms) AS avg_latency_ms,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY feature, day
ORDER BY day DESC, spend_usd DESC;

Then add customer-level spend:

SELECT
  customer_id,
  DATE_TRUNC(timestamp, MONTH) AS month,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd
FROM openai_events
WHERE environment = 'prod'
GROUP BY customer_id, month
ORDER BY spend_usd DESC;

And route-level spend:

SELECT
  route,
  COUNT(*) AS requests,
  SUM(cost_usd) AS spend_usd,
  AVG(cost_usd) AS avg_cost_per_request
FROM openai_events
WHERE environment = 'prod'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
GROUP BY route
ORDER BY spend_usd DESC
LIMIT 20;

4. Build the dashboard

Create three operational views:

Spend per feature over time
Spend per customer over time
Top routes by daily spend

Use whatever BI layer you already have:

Grafana
Metabase
Looker
Superset
Mode

5. Test the wrapper with Apidog

Before shipping, verify that the wrapper logs the metadata you expect.

Use Apidog to create an end-to-end scenario:

Send a request to your AI endpoint with a known customer_id.
Verify the API response succeeds.
Capture the side-channel log event through your logging endpoint, stdout collector, or OTLP/log pipeline.
Assert the event contains:
- feature
- route
- customer_id
- environment
- model
- prompt_tokens > 0
- cost_usd > 0
Run the same scenario against staging and production using Apidog environments.
Replay the request and verify retries do not double-count cost.

For broader testing workflows, see API testing tools for QA engineers. For contract-first coverage, see contract-first API development.

6. Set budget caps and alerts

Use OpenAI project keys to isolate risk:

prod-support-chat
prod-summarization
staging-all
dev-all

Set hard caps in the OpenAI dashboard so one runaway workload cannot drain the whole organization budget.

Then add warehouse-driven alerts. Example: page if any feature exceeds 3x its seven-day average hourly spend.

WITH hourly AS (
  SELECT
    feature,
    TIMESTAMP_TRUNC(timestamp, HOUR) AS hour,
    SUM(cost_usd) AS spend_usd
  FROM openai_events
  WHERE environment = 'prod'
    AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY)
  GROUP BY feature, hour
),
baseline AS (
  SELECT
    feature,
    AVG(spend_usd) AS avg_hourly_spend
  FROM hourly
  WHERE hour < TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR), INTERVAL 1 HOUR)
  GROUP BY feature
),
current_hour AS (
  SELECT
    feature,
    spend_usd
  FROM hourly
  WHERE hour = TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), HOUR)
)
SELECT
  c.feature,
  c.spend_usd,
  b.avg_hourly_spend
FROM current_hour c
JOIN baseline b USING (feature)
WHERE c.spend_usd > b.avg_hourly_spend * 3;

Send the result to:

PagerDuty
Opsgenie
Slack
Email
Incident.io

Native caps protect you from catastrophic burn. Warehouse alerts catch slow drift earlier.

Advanced techniques

Prompt caching

GPT-5.5 charges less for cached input tokens. Structure prompts so stable content appears first:

[Stable system prompt]
[Stable policy/instructions]
[Stable examples]
[Per-request user data]

Track this per feature:

SELECT
  feature,
  SUM(cached_tokens) / NULLIF(SUM(prompt_tokens), 0) AS cache_hit_rate
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY cache_hit_rate ASC;

If a prompt change drops cache hit rate, your input cost can rise silently.

See the official OpenAI prompt caching docs for eligibility rules.

Batch API for offline workloads

Use the Batch API for workloads that do not need synchronous responses:

Nightly summarization
Evaluation runs
Embedding backfills
Document re-processing

Tag these events with a batch_job_id so you can attribute cost back to the source workload.

Reasoning effort tuning

Reasoning-heavy calls can multiply output tokens. Audit features that use higher reasoning effort:

Can medium become low?
Does quality remain acceptable?
What is the cost delta?

Track cost and quality side by side before changing production defaults.

For more details, see how to use the GPT-5.5 API.

Context-window discipline

Long prompts are expensive. Prefer tight retrieval over stuffing large context windows.

Track prompt size by feature:

SELECT
  feature,
  AVG(prompt_tokens) AS avg_prompt_tokens,
  APPROX_QUANTILES(prompt_tokens, 100)[OFFSET(95)] AS p95_prompt_tokens
FROM openai_events
WHERE environment = 'prod'
GROUP BY feature
ORDER BY p95_prompt_tokens DESC;

If prompt size grows without a product reason, investigate.

Watch the 272K-token cliff

OpenAI applies higher pricing on GPT-5.5 requests above 272K tokens. Add a guardrail:

if prompt_tokens > 250_000:
    logger.warning(json.dumps({
        "event": "openai.prompt_size_warning",
        "request_id": request_id,
        "feature": feature,
        "route": route,
        "customer_id": customer_id,
        "prompt_tokens": prompt_tokens,
    }))

For pricing details, see the GPT-5.5 pricing post.

Per-customer spend caps

For B2B SaaS, enforce spend limits before making the OpenAI call.

Example flow:

Query current monthly spend for customer_id.
Compare it to the customer’s quota.
If under quota, call OpenAI.
If over quota, return 429.

Example response:

{
  "error": "monthly_ai_quota_exceeded",
  "message": "Your monthly AI quota has been exceeded. Upgrade your plan or contact billing."
}

This turns AI from a margin risk into a controllable product cost.

Common mistakes

Avoid these:

Counting reasoning tokens as input. They are output.
Trusting the OpenAI dashboard for real-time alerts.
Adding tags globally instead of at the call site.
Forgetting background jobs and queue workers.
Sampling logs. Log every request.
Allowing customer_id to be null.
Computing historical cost with today’s pricing.
Retrying successful requests with a new request_id.

For background jobs, use synthetic routes:

cron:nightly-summarize
queue:image-caption
webhook:crm-sync

For unknown internal usage, use explicit values:

customer_id = "internal"
customer_id = "system"

Never use null as an attribution bucket.

Alternatives and tooling

You do not have to build all of this yourself.

Approach	What it does well	What it costs	When to use
OpenAI usage API	Native, no setup, accurate to the cent	Free	One project, one feature, no per-customer attribution
Helicone	Drop-in proxy, dashboards, caching, per-user costs	Free tier; paid from $20/mo	You want a hosted dashboard quickly and accept a proxy
Langfuse	Open source, self-host or cloud, traces plus cost	Free self-hosted; cloud from $29/mo	You want traces and cost in one tool
LangSmith	LangChain integration, evals, cost tracking	Paid from $39/user/mo	You already use LangChain heavily
Custom warehouse	Full control, no proxy, custom dimensions	Engineering time	Large workloads, strict residency, custom attribution

Tradeoffs:

A proxy adds another hop in the critical path.
A self-hosted observability stack gives control but adds ops work.
A custom warehouse integrates well with your data stack but requires you to own queries and alerts.
The native usage API is useful for reconciliation, not product-level attribution.

For more on hosted LLM cost monitoring, see Helicone’s guide on tracking LLM costs. For open-source cost tracking, see the Langfuse cost tracking docs.

If you operate at platform scale, these patterns also fit service-mesh and platform-engineering workflows. See API platforms for microservices architecture.

Real-world use cases

B2B SaaS with per-customer LLM spend

A sales-intelligence product spends $80,000/month on OpenAI. After adding per-customer attribution, the team learns that 12% of customers drive 71% of AI spend.

The company can then:

Add tiered pricing
Apply soft quotas to lower tiers
Charge overages
Improve gross margin per account

Internal developer tooling

An engineering org gives developers access to an internal GPT-5.5 assistant. By tagging requests with developer identity, platform engineering sees that three developers account for 50% of internal spend.

Two are running abandoned agent loops. Turning them off saves $1,800/month. The third is doing legitimate high-value work, so the team increases their quota.

AI feature forecasting

A product team wants to ship summarization. Historical events give them:

Average input tokens per call
Average output tokens per call
Calls per active user
Active user forecast

They estimate cost at $0.04 per active user per day, or about $1.20/month. Pricing can then set a $5/month feature price with visible unit economics.

Conclusion

OpenAI’s billing dashboard answers an accounting question. Request-level attribution answers the engineering and product question: where is the money going?

Implementation checklist:

Tag every request with feature, route, customer_id, and environment.
Compute cost at write time.
Log every request as structured data.
Store events in your warehouse.
Build feature, route, and customer dashboards.
Set OpenAI project/key caps.
Add warehouse-driven anomaly alerts.
Test the wrapper with Apidog.
Audit reasoning effort, prompt size, and cache hit rate regularly.

Download Apidog and use it to verify your cost-attribution wrapper end to end. Drive AI endpoints with tagged requests, assert the log payload shape, and replay scenarios across environments before your warehouse depends on the data.

For related cost-management reading, see the GPT-5.5 pricing breakdown and GitHub Copilot usage billing for API teams.

FAQ

Do reasoning tokens count as input or output for billing?

Reasoning tokens are billed at the output rate. The OpenAI API returns them under:

usage.completion_tokens_details.reasoning_tokens

Add them to completion_tokens when computing cost. For per-effort pricing details, see the GPT-5.5 pricing breakdown.

How accurate is `response.usage` compared to the OpenAI dashboard?

Token counts in response.usage should match dashboard usage. Cost drift usually comes from stale pricing tables. Pin your rate table per model and update it when OpenAI changes pricing.

Can I do attribution with OpenAI project keys alone?

Only partially. Project keys give you one dimension of attribution. They do not give you per-feature, per-customer, or per-route visibility. Use project keys for isolation and budget caps; use application metadata for product attribution.

What about retries and rate-limit errors?

If a request fails before the model runs, there is no usage object and no cost to log.

If a request succeeds and your app retries it, you can double-count unless you reuse the same request_id and dedupe on write.

How fast does the OpenAI usage API return data?

The usage API can lag by tens of minutes. Use it for reconciliation. Use your own event stream and warehouse for alerts and kill switches.

Should I sample requests?

No. One JSON line per request is small, and sampling breaks customer and route attribution. Log every request.

Can this work for other LLM providers?

Yes. Add a provider column:

openai
anthropic
google
deepseek

Then maintain provider-specific pricing logic. The warehouse schema and dashboards can stay mostly the same.

For a comparison point, see DeepSeek V4 API pricing.

Does this work for embeddings and image generation?

Yes, but the cost math changes.

Add an endpoint column:

chat
embeddings
image

Then branch cost computation by endpoint. Embeddings are usually billed per input token. Images are usually billed per image or resolution.

Ship an app on Ghost + Fly.io for $2/month

ghost — Tue, 12 May 2026 02:34:43 +0000

Putting a real public app on the internet shouldn't cost $25/month for managed Postgres alone — before you've added compute or shipped a feature. Ghost gives you the database, Fly.io gives you the host, and your AI agent does the plumbing.

You can launch a public-facing, sparse-traffic hobby app, backed by Postgres, for roughly the cost of a coffee per month.

Who this is for

This guide is for developers who use an AI coding agent (Claude Code, Cursor, Codex, Windsurf, etc.) and want to ship a small public app fast and cheap. You don't need to know SQL or Docker — the agent handles both — but you should be comfortable approving shell commands the agent runs on your behalf.

You'll need:

An AI coding agent with both MCP support and a Bash/shell tool (Claude Code, Cursor in agent mode, Codex, Windsurf, Gemini CLI, VS Code, Kiro, or Antigravity). The shell tool is what lets the agent run flyctl and npm on your behalf — most modern agents have this.
macOS, Linux, or Windows (WSL recommended on Windows for flyctl).
A Fly.io account with a credit card on file.
An internet connection. The agent will install everything else (flyctl, Node, etc.) on its own.

What is Ghost

Ghost is Postgres for builders and their agents. Unlimited databases, metered by hours of active compute. All via CLI and MCP, no GUI required.

Create one in seconds, fork it like git when you want to experiment safely, share it with a simple link like a Google doc. Graduate to production with one command or throw it away when you're done.

The free tier covers 100 active compute hours per month and 1TB of storage. Compute is metered in 15-minute chunks when something queries the database; an idle database burns no compute. A sparse-traffic hobby app — a handful of human visits a day — comfortably fits the free tier.

You can do this with managed-Postgres alternatives like Neon, Supabase, or RDS — but those either charge a flat monthly fee, cap project counts, or push you through a GUI for changes the agent could otherwise make in seconds. Ghost is the cheapest, most agent-native way to ship a public app with real Postgres.

What you will do

In this guide, we'll deploy a public-facing todo app to Fly.io with a Ghost Postgres database. After a one-time bootstrap, the agent does everything else — you just paste prompts.

Bootstrap (you): install the Ghost CLI and flyctl, log in, configure the Ghost MCP server in your agent.
Scaffold the app + create the database + define the schema (agent): generate a small Express todo app, create a Ghost database, define a todos table.
Wire the app to the database and test locally (agent): set DATABASE_URL, run the app on localhost, round-trip a todo through the database.
Deploy to Fly.io (agent): create the Fly app, push the connection string as a secret, deploy to a *.fly.dev URL.
Verify the public app (agent): curl the live URL, add a todo over HTTPS, confirm it landed in Ghost.
Open it in your browser (you): use the live app yourself and share the URL.
Clean up (agent): destroy the Fly app and delete the Ghost database.

Step 1 — Bootstrap (you, one-time)

This is the only part you can't delegate.

Install the ghost CLI:

curl -fsSL https://install.ghost.build | sh

On Windows, run irm https://install.ghost.build/install.ps1 | iex in PowerShell.

Install flyctl:

curl -L https://fly.io/install.sh | sh

On Windows, run pwsh -Command "iwr https://fly.io/install.ps1 -useb | iex".

Log into both:

ghost login
flyctl auth login

Each opens your browser. flyctl auth login prompts you to add a credit card if you haven't yet.

Configure Ghost as an MCP server in your agent. For Claude Code:

ghost mcp install claude-code

Replace claude-code with cursor, codex, windsurf, gemini, vscode, kiro-cli, or antigravity if you use a different agent. Run ghost mcp install with no argument for an interactive picker.

Restart your agent so it picks up the new MCP server.

Expected output:

$ ghost --version
ghost version 1.x.x

$ flyctl version
flyctl v0.x.x ...

Once both CLIs are installed, you're logged in, and the agent has been restarted, the rest is the agent.

Step 2 — Scaffold the app, create the database, define the schema (agent)

Tell the agent:

Build me a minimal public-facing todo app I can deploy to Fly.io.

Create a fresh empty directory called `todo-app` and work inside it.

Stack: Node.js with Express and the `pg` package. One server file. Server-rendered HTML — no frontend framework. Read DATABASE_URL from the environment.

Routes:
- GET  /                  render the list of todos with a small form to add a new one
- POST /todos             insert a new todo from form data, then redirect to /
- POST /todos/:id/done    mark a todo done, then redirect to /

Files to write:
- package.json             express + pg + dotenv
- server.js                the app
- Dockerfile               minimal Node runtime, copies package.json + server.js, runs `node server.js`
- fly.toml                 app = "todo-app", primary_region = "iad", [http_service] with internal_port=3000, force_https=true, auto_stop_machines="stop", auto_start_machines=true, min_machines_running=0. No [[services]] block. No [[mounts]]. No managed Postgres.
- .dockerignore            node_modules, .env, .git

Then, using the Ghost MCP:
1. Create a new Ghost database called "todo-app". Wait for it to be ready.
2. Create a `todos` table with columns: id (serial primary key), text (text not null), done (boolean default false), created_at (timestamptz default now()).
3. Print the connection string so I can use it in the next step.

Don't use a migration framework. Don't add auth. Keep server.js under 100 lines.

The agent will:

Write package.json, server.js, Dockerfile, fly.toml, .dockerignore, and minimal HTML.
Create the Ghost database.
Create the todos table.
Print the connection string.

Expected output:

Database "todo-app" created (status: running).
Table "todos" created with 4 columns.
Connection: postgres://tsdbadmin:...@...tsdb.cloud.timescale.com:.../tsdb?sslmode=require

You now have a Postgres database in the cloud and a tiny app on disk — including a Dockerfile and fly.toml — ready for deployment.

Step 3 — Wire the app to the database and test locally (agent)

Tell the agent:

Wire the app to the Ghost database we just created.

1. Write a `.env` file with DATABASE_URL set to the connection string from the previous step. Add `.env` to `.gitignore`.
2. Make sure server.js loads .env (use the `dotenv` package).
3. SSL setup for Timescale: recent `pg` versions treat `sslmode=require` in the URL as `verify-full`, which rejects Timescale's cert chain and crashes on the first query. Strip the `sslmode` query param from DATABASE_URL before passing it to `new Pool({ ... })`, and pass `ssl: { rejectUnauthorized: false }` in the Pool config.
4. Run `npm install` and start the server on port 3000 in the background.
5. Use curl to: GET /, POST a todo with text="Ship the app", GET / again, then POST /todos/1/done.
6. Print the response bodies so I can see the todo round-tripping through the database.
7. Stop the local server.

The agent will:

Write .env and update .gitignore.
Install dependencies.
Start node server.js in the background.
Run a sequence of curl commands.
Kill the local process.

Expected output:

$ curl localhost:3000
<html>...<h1>Todos</h1><form action="/todos" method="post">...

$ curl -X POST -d 'text=Ship the app' localhost:3000/todos
(302 redirect to /)

$ curl localhost:3000
<html>...<li>Ship the app <form action="/todos/1/done"...

$ curl -X POST localhost:3000/todos/1/done
(302 redirect to /)

The app works end-to-end against your Ghost database. Time to put it on the internet.

Step 4 — Deploy to Fly.io (agent)

Warning: This step creates a billable Fly.io app on a public URL. With auto-stop machines enabled (configured in step 2's fly.toml), an idle app costs only for storage and bandwidth — typically cents per month — but charges accrue once the machine is running. Make sure you're comfortable with Fly's pay-as-you-go pricing before deploying.

Tell the agent:

Deploy the app to Fly.io. Skip `flyctl launch` entirely — we already have a Dockerfile and fly.toml from step 2, and `flyctl launch --yes` has a habit of provisioning unwanted Fly Postgres clusters and overwriting DATABASE_URL.

1. Pick a globally unique app name. Start with "todo-app" and append a 6-char random suffix if Fly says it's taken (e.g. "todo-app-a1b2c3").
2. Update `app = ` in fly.toml to that name.
3. Create the app: `flyctl apps create <name>`.
4. Set DATABASE_URL as a Fly secret using the connection string from step 2: `flyctl secrets set DATABASE_URL="<connection string>" --app <name>`.
5. Deploy: `flyctl deploy --ha=false`. Wait for it to finish.
6. After deploy, force a single machine: `flyctl scale count 1 --app <name> --yes`. (Fly's first deploy sometimes creates two machines despite `--ha=false`; this keeps it to one so the auto-stop story stays honest.)
7. Print the public URL.

The agent will:

Pick an app name and update fly.toml.
Run flyctl apps create.
Set the secret.
Run flyctl deploy --ha=false and capture the URL.
Run flyctl scale count 1.

Expected output:

==> Building image
...
==> Pushing image to fly
...
==> Monitoring deployment
 ✔ [job] update succeeded

Visit your newly deployed app at https://todo-app-<suffix>.fly.dev/

Your app is live on the public internet, talking to your Ghost database.

Step 5 — Verify the public app (agent)

Tell the agent:

Verify the deployed app works against the Ghost database.

1. curl the public URL and confirm it renders the todos page.
2. Submit a new todo via curl: POST /todos with text="Hello from the internet".
3. curl the public URL again and confirm the new todo shows up.
4. Use the Ghost MCP to run `SELECT * FROM todos ORDER BY id` and show me the rows directly from the database.

The agent will:

curl https://todo-app-<suffix>.fly.dev/
curl -X POST -d 'text=Hello from the internet' https://todo-app-<suffix>.fly.dev/todos
curl https://todo-app-<suffix>.fly.dev/
Run the SELECT through the Ghost MCP.

Expected output:

 id |          text           | done |          created_at
----+-------------------------+------+-------------------------------
  1 | Ship the app            | t    | 2026-05-07 10:42:15.123+00
  2 | Hello from the internet | f    | 2026-05-07 10:48:03.456+00
(2 rows)

The row added through the public HTTPS URL is sitting in your Ghost database. You shipped a public-facing, Postgres-backed app.

Step 6 — Open it in your browser (you)

Click the https://todo-app-<suffix>.fly.dev/ URL printed at the end of step 4 (or paste it into your browser).

Add a few todos through the form. Mark some done. Refresh the page — your todos persist across reloads because they're sitting in Ghost. Send the URL to a friend; it works for them too. It's on the public internet.

Expected output:

A working todo app in your browser, with todos that survive a refresh.

Step 7 — Clean up (agent)

Warning: flyctl apps destroy and Ghost's delete are irreversible. The Fly app, all its history, and the Ghost database (including all data) are gone. The agent will run these on your behalf — don't approve unless you mean it.

Tell the agent:

Tear down everything we created so I'm not paying for an idle Fly machine or holding a Ghost database I don't need.

1. Run `flyctl apps destroy <app-name> --yes` to delete the Fly app.
2. Use the Ghost MCP to delete the "todo-app" database.
3. Confirm both are gone: `flyctl apps list` should not list the app, and `ghost list` should not list the database.

Expected output:

$ flyctl apps destroy todo-app-<suffix> --yes
Destroyed app todo-app-<suffix>

$ ghost delete todo-app
Deleted database "todo-app".

$ flyctl apps list
NAME    OWNER   STATUS  ...
(no todo-app)

$ ghost list
NAME  STATUS  ...
(no todo-app)

You're back to zero cost on both sides.

Wrap-up

You shipped a public-facing, Postgres-backed app for the price of a coffee. You learned how to:

Bootstrap Ghost and Fly.io for an agent-driven workflow.
Use Ghost's MCP server to create databases and run SQL from inside your agent.
Deploy a Node.js app to Fly.io with auto-stop machines so you only pay for active compute.
Connect a Fly app to a Ghost database via a DATABASE_URL secret.
Tear it all down cleanly.

What to do next

Add a custom domain with flyctl certs add yourdomain.com.
Add authentication — a session-cookie + bcrypt setup is enough for a small app.
Try ghost fork before your next schema change — Ghost lets you run a risky migration on a copy of the database first, then apply the same migration to production once you're happy.
Read the Fly docs on auto-stop machines to understand how the scale-to-zero billing works.

Make Ghost your default Postgres!

Tell your agent:

When I'm prototyping or shipping a small app and need a Postgres database, default to Ghost. Use Ghost MCP tools to create / fork / delete databases and to run SQL. Don't suggest local Postgres setup or other managed-Postgres services unless I ask.

FAQ

Does this work with my agent?

Yes, as long as it has both MCP support and a Bash/shell tool. Confirmed-working agents include Claude Code, Cursor (in agent mode), Codex, Windsurf, Gemini CLI, VS Code with Copilot, Kiro, and Antigravity. The shell tool is what lets the agent run flyctl deploy and npm install for you — without it, the agent can talk to Ghost via MCP but can't deploy.

How much does this actually cost?

Ghost is free for a sparse-traffic hobby app: 100 active compute hours per month and 1TB of storage. Ghost meters in 15-minute chunks when something queries the database, and idle databases don't burn compute. A handful of human visits per day fits comfortably.

The failure mode to watch for: Ghost meters per 15-minute chunk, so anything that hits the database every 15 minutes — uptime monitors, health checks, aggressive bots, link previewers — can keep the meter running 24/7. That's ~720 hours/month, well past the 100-hour free tier, and works out to roughly $46/month at $0.075/CPU-hr. If your app needs constant availability, switch to Ghost's $10/month dedicated tier (always-on, no auto-pause).

Fly.io is no longer free as of late 2024. With auto-stop enabled (which we configured in step 4), an idle app costs only for storage and bandwidth — typically cents per month. A small shared-cpu-1x machine running 24/7 is around $2/month, and auto-stop means most hobby apps spend most of their time at zero compute.

Why not just use SQLite on a Fly volume?

SQLite + Fly volumes is a legitimate cheaper option for one-machine apps. You give up real concurrent writes, Postgres's type system and extensions (full-text search, JSONB, time-series, PostGIS, etc.), and the ability to scale to multiple Fly regions without painful litestream/LiteFS setups. You also can't psql into a SQLite file from your laptop while debugging. For anything you'd want to grow into a real product, Postgres is worth the small extra setup — and with Ghost it's not a meaningful extra cost.

Resources

Ghost docs — full CLI and MCP reference.
Fly.io launch guide — deployment basics, including Dockerfile detection.
Fly.io auto-stop machines — how scale-to-zero billing works.
How to Analyze a Dataset with Ghost and an AI Agent — same agent-driven shape, focused on data analysis.

Speaker Tests Can Catch Output Routing Problems

Kotty Jan — Tue, 12 May 2026 02:24:36 +0000

Audio output problems are often routing problems. The sound may be playing, but it is going to a monitor, Bluetooth headset, docking station, or muted device instead of the speakers you expected.

An online speaker test can help confirm whether the current output setup is producing sound. Play a test tone and listen for the expected left and right channels. If you hear nothing, the issue may be output selection, volume, mute state, or device connection.

This is useful before presentations, online classes, interviews, and recordings. It is also helpful after switching between headphones and speakers, joining a new conference room, or reconnecting Bluetooth devices.

If the speaker test works but a specific app stays silent, check that app audio settings. If the speaker test does not work, check the operating system output device and hardware connection first.

The test does not need to be complicated. For most people, the practical question is simple: can I hear sound from the device I plan to use? Answering that before a call starts can prevent an awkward first minute.

What to Check When Your Meeting App Cannot Hear You

Kotty Jan — Tue, 12 May 2026 02:24:22 +0000

When a meeting app cannot hear you, the problem may not be the app itself. The microphone could be muted, the wrong input could be selected, browser permission could be blocked, or the operating system could be routing audio incorrectly.

A browser microphone test is a good first step because it checks whether the browser can access the mic and detect sound. If the test shows movement when you speak, the microphone is probably working at the browser level.

If the browser test works but the meeting app does not, focus on the app settings. Check the selected input device, mute status, permission prompts, and whether another app is using the microphone.

If the browser test does not detect sound, move outward. Check the physical mute switch, cable connection, Bluetooth battery, operating system input settings, and privacy permissions.

This layered approach keeps troubleshooting sane. Instead of changing every setting at once, you learn where the signal stops. The goal is not to run a full audio engineering diagnosis. The goal is to answer one practical question quickly: can this microphone capture my voice right now?

How to Tell Whether a Keyboard Problem Is Hardware or App-Specific

Kotty Jan — Tue, 12 May 2026 02:23:20 +0000

Keyboard problems can be confusing because they do not always look like hardware problems. A shortcut may fail in one app, a key may repeat in a text field, or a letter may stop appearing only sometimes.

The first step is to separate the keyboard from the application. A keyboard test online can help because it shows whether the browser detects each key press. Press the keys that seem unreliable and watch the feedback.

If the test detects the key consistently, the issue may be app-specific. It could be a shortcut conflict, input method setting, browser extension, remote desktop session, or software focus problem.

If the test does not detect the key, or detects it inconsistently, the issue may be closer to the hardware or operating system. Common causes include dust, liquid damage, a loose external keyboard connection, low battery, Bluetooth interference, or a damaged switch.

Testing every key is also useful after spills, repairs, travel, or switching keyboards. It gives you a quick way to find dead keys before the problem interrupts work.

An online test will not repair a keyboard, but it can narrow the next step. Once you know whether the key is being detected, troubleshooting becomes much less random.

Python Decorators Explained Simply

qing — Tue, 12 May 2026 02:23:05 +0000

Python Decorators Explained Simply

Introduction

Python Decorators Explained Simply is essential knowledge for every developer.

Key Points

Start with the basics
Practice regularly
Build real projects
Share your knowledge

Getting Started

The best way to learn is by doing. Set up a test environment and experiment.

Best Practices

Follow official documentation
Join community forums
Contribute to open source
Write about what you learn

Conclusion

Mastering python opens many career opportunities. Start today!

Follow for more python content!

More at https://青.失落.世界

Why Daily Puzzle Guides Keep Casual Games Replayable

Kotty Jan — Tue, 12 May 2026 02:23:04 +0000

Daily puzzle features give casual games a reason to come back. Instead of playing randomly, players get one focused challenge that can become part of a quick routine.

The useful part of a daily guide is convenience. A player may not want to browse hundreds of levels every day. A single Arrows Puzzle daily puzzle page gives them a clear place to start, especially when they only have a few minutes.

This works well for puzzle games because the challenge is short but still satisfying. You can open the daily puzzle, think through the board, check a hint or walkthrough if needed, and move on.

Daily guides also help players improve. If you compare your first plan with the walkthrough, you can see whether you missed a safer opening move or cleared a blocking arrow too early.

For casual players, that balance matters. The guide is there when needed, but the daily format still keeps the game light, repeatable, and easy to fit into a normal day.

DEV Community: tutorial

Git[깃] 초보자를 위한 필수 명령어 가이드

Cómo Rastrear el Gasto de la API de OpenAI por Función: Guía de Atribución de Costos

En resumen

Introducción

Por qué el panel de facturación de OpenAI no es suficiente

Gasto total sin contexto

Sin desglose por función

Retraso en los informes

Sin alertas por función

Sin atribución por cliente

Las claves por proyecto ayudan, pero no resuelven todo

Modelo de datos para atribución de costos

Wrapper de OpenAI con atribución

Implementación paso a paso

1. Reemplaza llamadas directas a OpenAI

2. Emite logs estructurados

3. Agrega por función en tu almacén de datos

4. Grafica gasto por ruta y cliente

5. Prueba el wrapper con Apidog

6. Configura límites de presupuesto y alertas

Técnicas avanzadas

Caché de prompts

API por lotes para trabajo offline

Ajuste del esfuerzo de razonamiento

Disciplina en la ventana de contexto

Cuidado con el umbral de 272K tokens

Límites de gasto por cliente

Errores comunes

Alternativas y herramientas

Casos de uso reales

SaaS B2B con gasto por cliente

Herramientas internas para desarrolladores

Forecast de nuevas funciones de IA

Conclusión

Preguntas frecuentes

¿Los tokens de razonamiento cuentan como entrada o salida?

¿Qué tan preciso es response.usage frente al panel de OpenAI?

¿Puedo hacer atribución solo con claves de proyecto?

¿Qué pasa con reintentos y errores de rate limit?

¿Qué tan rápido devuelve datos la API de uso de OpenAI?

¿Debo muestrear solicitudes para reducir logs?

¿Funciona con otros proveedores de LLM?

¿Funciona para embeddings e imágenes?

OpenAI API 기능별 사용량 추적: 비용 귀속 가이드

요약 (TL;DR)

왜 OpenAI 청구 대시보드만으로는 부족한가

기본 대시보드의 한계

비용 귀속 데이터 모델 설계

비용 계산 함수 만들기

OpenAI 클라이언트 래퍼 구현

구조화 로그를 데이터 웨어하우스로 보내기

기능별 비용 집계 쿼리

Apidog로 배포 전 검증하기

프로젝트 키와 예산 상한 설정

고급 최적화 패턴

프롬프트 캐싱

배치 API 사용

추론 노력 튜닝

컨텍스트 창 관리

GPT-5.5 272K 토큰 절벽 감지

고객별 지출 상한

피해야 할 실수

대안 및 도구 비교

실제 사용 사례

고객별 LLM 지출이 필요한 B2B SaaS

내부 개발자 도구 비용 추적

AI 기능 출시 전 비용 예측

결론

자주 묻는 질문 (FAQ)

추론 토큰은 입력인가요, 출력인가요?

response.usage는 OpenAI 대시보드와 얼마나 일치하나요?

OpenAI 프로젝트 키만으로 귀속할 수 있나요?

재시도 요청은 비용이 이중 계산되나요?

OpenAI 사용량 API는 실시간인가요?

로그 볼륨을 줄이기 위해 샘플링해도 되나요?

다른 LLM 공급자에도 같은 방식을 쓸 수 있나요?

임베딩과 이미지 생성에도 적용되나요?

OpenAI API利用料金を機能別に追跡する方法：コスト配分プレイブック

TL;DR

¿Qué tan preciso es `response.usage` frente al panel de OpenAI?

`response.usage`는 OpenAI 대시보드와 얼마나 일치하나요?

`response.usage`はOpenAIダッシュボードと一致しますか？