Most indie devs are hemorrhaging money on AI APIs. I was paying $340/month for API calls that should have cost $9. Three days of optimization changed everything.
The Hidden Cost Nobody Warns You About
When I first integrated Claude and GPT-4 into my side projects, I was copy-pasting examples from docs. The calls worked. The bill did not.
Month 1: $47
Month 2: $198
Month 3: $340 (when I finally panicked)
Here's what I was doing wrong — and the fixes that dropped my costs 97%.
Fix 1: Stop Re-sending the Same System Prompt
If your system prompt is 2,000 tokens and you make 1,000 calls/day, you're paying for 2 million tokens of the same text every day.
The solution: prompt caching. Anthropic's Claude supports cache_control: {type: 'ephemeral'} on messages. First call pays full price. Every subsequent call within 5 minutes costs 10% of that.
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": YOUR_LONG_SYSTEM_CONTEXT,
"cache_control": {"type": "ephemeral"} # Cache this!
},
{"type": "text", "text": user_query} # Variable part
]
}
]
This alone cut 60% of my bill.
Fix 2: Use the Right Model for the Right Task
I was using GPT-4 Turbo for everything. Including tasks like:
- Classifying text into 5 categories
- Extracting a date from a sentence
- Checking if a URL is valid
These tasks don't need a $0.03/1K token model. They work fine on $0.0015/1K token models.
Model routing rule:
- Simple classification, extraction, validation → Haiku or GPT-3.5
- Reasoning, generation, complex tasks → Sonnet or GPT-4o-mini
- Only for truly hard problems → Opus or GPT-4
I built a simple router that checks task complexity before picking a model. Result: 80% of my calls now hit cheap models.
Fix 3: Batch What You Can
If you're making 100 calls to summarize 100 articles, don't make 100 sequential API calls. Use Anthropic's Message Batches API — it's 50% cheaper than individual calls and processes async.
requests = [{
"custom_id": f"article-{i}",
"params": {"model": "claude-haiku-4-5", "messages": [{...}]}
} for i, article in enumerate(articles)]
batch = client.messages.batches.create(requests=requests)
# Poll for completion, then fetch results
Fix 4: Cache at the Application Layer
For any query that might repeat (same user asking the same question, same product description being generated), cache the response in Redis or even a simple SQLite dict.
I analyzed my logs: 34% of my API calls were for inputs I'd already processed. Caching those is literally free money.
The Results After 3 Days
| Before | After |
|---|---|
| $340/month | $10.20/month |
| 100% expensive models | 80% cheap, 20% smart |
| No caching | 90%+ cache hit rate |
| Sequential calls | Batched where possible |
Total reduction: 97%
What This Actually Unlocks
With AI costs at $10/month instead of $340, you can:
- Offer more AI features without worrying about margins
- Run experiments without budget fear
- Actually be profitable at indie scale ($9-12/product price points work)
I documented the full optimization system — every prompt, every routing decision, every caching layer — in the AI API Cost Optimization Handbook. It's 47 pages of the exact system I use.
What's your current monthly AI API bill? Have you done any cost optimization? Drop it in the comments — I'm curious how bad it gets for others.
Top comments (0)