DEV Community

Cover image for Why does nobody talk about how expensive idle OpenClaw agents are?
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

Why does nobody talk about how expensive idle OpenClaw agents are?

Why does nobody talk about how expensive idle OpenClaw agents are?

I keep seeing the same OpenClaw cost mistake:

People try to save money by making the assistant write shorter replies.

Meanwhile the real bill is coming from the agent waking up all night, reloading bootstrap context, sending heartbeats, and calling expensive frontier models for work that barely deserves a model invocation.

One OpenClaw user in an r/openclaw thread basically described the nightmare version of this: go to sleep, wake up, and discover the agent spent the night doing housekeeping.

Not shipping code.
Not solving a hard planning problem.
Not even using tools in a meaningful way.

Just background churn.

That thread was here:
https://reddit.com/r/openclaw/comments/1taouqv/how_to_stop_burning_tokens/

And it lines up with what a lot of people running always-on agents eventually learn the hard way:

your OpenClaw bill usually gets ugly before the agent does anything useful.

The fastest way to stop burning tokens in OpenClaw usually is not shorter replies. It is fewer background runs, less repeated bootstrap injection, and better model routing.

The real problem: your agent is "idle" but not actually idle

From the outside, an OpenClaw agent can look quiet.

No visible output.
No major tool calls.
No obvious work.

But under the hood, it may still be doing expensive stuff repeatedly:

  • heartbeat loops
  • bootstrap prompt injection on every run
  • context reconstruction
  • routine status checks
  • maintenance turns on GPT-5.4, Claude Opus 4.6, or Grok 4.20

That last one is the killer.

Using frontier models for maintenance work is like using a senior staff engineer to check whether a cron job is still alive.

Technically possible. Financially dumb.

What the Reddit thread got right

The useful part of the r/openclaw discussion was not the vote count. It was that multiple users kept describing the same pattern from different angles.

Some were looking at prompt caching.
Some were focused on compaction.
Some were confused why "idle" sessions still had meaningful token usage.

But the same root cause kept showing up:

the agent was repeatedly rebuilding context and paying for maintenance cycles.

That means if your setup keeps reloading files like AGENTS.md, SOUL.md, TOOLS.md, and HEARTBEAT.md every wake-up, your bill can grow even when the final response is tiny.

So if your first move is "make replies shorter," you are probably optimizing the wrong thing.

What usually burns tokens in OpenClaw

Here are the three biggest offenders.

1. Heartbeat loops

Each wake-up looks cheap.

But if the agent wakes up every few minutes, every wake-up can trigger another full prompt assembly.

That compounds fast.

A simple back-of-the-envelope example:

bootstrap context: 12,000 tokens
heartbeat interval: every 5 minutes
runs per day: 288

12,000 * 288 = 3,456,000 input tokens/day
Enter fullscreen mode Exit fullscreen mode

That is before the agent does real work.

2. Repeated bootstrap injection

If you keep stuffing the same long instructions into context on every run, you are paying repeatedly for mostly static information.

Typical examples:

  • AGENTS.md
  • SOUL.md
  • TOOLS.md
  • policy and operating instructions
  • large tool schemas
  • repeated environment descriptions

If that content is stable, it should not be blindly re-injected every time.

3. Premium models doing janitorial work

This is the painful one.

If GPT-5.4, Claude Opus 4.6, or Grok 4.20 is handling heartbeat checks, no-change polling, or low-risk status verification, you are paying premium reasoning prices for cheap maintenance tasks.

That is bad routing.

The beginner fix that usually does not work

Most people start here:

  • set max_tokens lower
  • tell the model to be concise
  • compress replies
  • overuse compaction
  • remove useful instructions

Some of that can help at the margins.

But if the agent is still waking up too often and rebuilding a giant prompt every time, shaving 100 tokens off the final reply is basically irrelevant.

Here is the rough shape of the mistake:

# Feels efficient, usually isn't
model: gpt-5.4
max_tokens: 120
heartbeat_minutes: 5
inject_every_run:
  - AGENTS.md
  - SOUL.md
  - TOOLS.md
  - HEARTBEAT.md
Enter fullscreen mode Exit fullscreen mode

That config can still burn money while producing very short outputs.

What to audit first

If your OpenClaw bill is higher than expected, start with an operational audit.

Check wake-up frequency

Look for scheduled runs, heartbeat loops, and polling behavior.

grep -R "heartbeat\|schedule\|poll" ./openclaw-config ./logs
Enter fullscreen mode Exit fullscreen mode

Questions to ask:

  • How often does the agent wake up when nothing changed?
  • Does every wake-up trigger a full prompt rebuild?
  • Are retries too aggressive?
  • Are there multiple overlapping monitors for the same task?

Measure repeated bootstrap size

You want to know how much static context gets re-sent per run.

wc -c AGENTS.md SOUL.md TOOLS.md HEARTBEAT.md
Enter fullscreen mode Exit fullscreen mode

Or if you have token tooling in your stack, estimate token count directly.

python estimate_tokens.py AGENTS.md SOUL.md TOOLS.md HEARTBEAT.md
Enter fullscreen mode Exit fullscreen mode

If the same 8k to 20k tokens are being resent over and over, that is a major cost center.

Verify caching instead of assuming it works

A lot of developers assume repeated prefixes are being cached.

Sometimes they are not.

Sometimes tiny prompt changes break reuse.

Sometimes the provider behavior is not what you thought.

If you do not have evidence from logs or billing traces, assume nothing.

Inspect model routing

List every place where your agent picks a model.

grep -R "gpt-5.4\|claude\|grok" ./openclaw-config ./src
Enter fullscreen mode Exit fullscreen mode

Then ask the uncomfortable question:

Which of these calls actually need a frontier model?

What should use GPT-5.4 vs a cheaper maintenance model?

My rule is simple:

frontier models should do frontier work.

Use GPT-5.4, Claude Opus 4.6, or Grok 4.20 for:

  • difficult code generation
  • multi-step debugging
  • planning across messy constraints
  • ambiguous tool decisions
  • recovery after failures
  • high-stakes user-facing output

Do not use them for:

  • heartbeat checks
  • no-op polling
  • routine status verification
  • simple classification
  • low-risk tool orchestration
  • repeated maintenance turns with nearly identical context

Here is a practical split.

Task Model tier
Heartbeat / keepalive check Cheap maintenance model
Has anything changed? Cheap maintenance model
Simple tool routing Cheap maintenance model
Summarize recent state Cheap maintenance model
Multi-file refactor GPT-5.4 / Claude Opus 4.6 / Grok 4.20
Failure recovery after broken tool chain GPT-5.4 / Claude Opus 4.6 / Grok 4.20
Hard planning with ambiguous constraints GPT-5.4 / Claude Opus 4.6 / Grok 4.20

If your stack supports model tiering, use it aggressively.

A better OpenClaw shape

Here is the kind of architecture that usually makes more sense.

maintenance_model: small-fast-model
reasoning_model: gpt-5.4
heartbeat_minutes: 30
inject_every_run:
  - minimal_runtime_instructions
retrieve_on_demand:
  - AGENTS.md
  - SOUL.md
  - TOOLS.md
escalate_to_reasoning_model_when:
  - task_is_ambiguous
  - code_change_is_nontrivial
  - tool_failure_needs_recovery
  - user_visible_output_is_high_stakes
Enter fullscreen mode Exit fullscreen mode

The idea is straightforward:

  • wake up less often
  • inject less static context
  • route boring work to a cheaper model
  • escalate only when the task becomes hard

Compaction is not magic either

Compaction can help.

It can also backfire.

If you compact too aggressively, the model has to:

  1. summarize old context
  2. reread the summary later
  3. spend extra turns recovering missing detail

That can reduce context pressure while increasing total spend.

So yes, compaction is useful.

No, it is not the first lever I would pull if the real issue is repeated wake-ups plus giant bootstrap prompts.

Why OpenClaw users get hit harder than most

OpenClaw is built for persistent agent behavior.

That is the whole point.

Always-on agents need:

  • memory
  • instructions
  • tool definitions
  • retries
  • monitoring
  • background execution

None of that is inherently bad.

The problem is what happens when that architecture collides with per-token pricing.

Per-token billing punishes exactly the behaviors that make persistent agents useful:

  • long-running sessions
  • recursive loops
  • retries
  • monitoring
  • tool-rich prompts
  • continuous execution

So developers start optimizing for survival instead of quality.

They shorten prompts that should stay rich.
They avoid retries that would improve reliability.
They turn off useful monitoring.
They get nervous about one more turn of reasoning.

That is token anxiety.

And OpenClaw users feel it hard because their agents are designed to stay alive.

What actually fixes the problem

If I were debugging an expensive OpenClaw deployment this week, I would do these five things in order:

1. Reduce unnecessary background runs

If the agent does not need to wake up, do not wake it up.

2. Shrink repeated bootstrap injection

Move stable knowledge out of always-injected context and into retrieval or more selective memory.

3. Tier models aggressively

Maintenance goes to a cheap model.

Real reasoning goes to GPT-5.4, Claude Opus 4.6, or Grok 4.20 only when needed.

4. Verify caching with logs

Do not trust assumptions.

Measure actual reuse.

5. Stop treating pricing as a side detail

For always-on agents, pricing changes architecture decisions.

That is not theory. That is operations.

Why flat monthly pricing fits OpenClaw better

This is the part people avoid saying out loud:

per-token pricing is a bad fit for always-on OpenClaw setups.

Not because tokens are evil.

Because the billing model changes behavior.

If you are constantly worrying that heartbeats, retries, long sessions, or monitoring loops might explode your bill, you are not really building autonomous systems.

You are managing a meter.

That is why Standard Compute is interesting for this exact use case.

Standard Compute gives you access to models like GPT-5.4, Claude Opus 4.6, and Grok 4.20 through one OpenAI-compatible API, but with flat monthly pricing instead of per-token billing.

For OpenClaw users, that matters.

It means you can focus on:

  • runtime behavior
  • model routing
  • reliability
  • quality
  • letting agents run continuously

Instead of obsessing over whether one more loop, one more retry, or one more maintenance cycle is going to create a surprise bill.

If your OpenClaw agents run 24/7, predictable monthly pricing is just a better operational match than token-metered billing.

More here:
https://standardcompute.com

The short version

If your OpenClaw bill is ugly, do not start by trimming adjectives out of responses.

Start here:

  • audit wake-ups
  • measure bootstrap size
  • route maintenance away from frontier models
  • verify caching
  • stop using a pricing model that punishes persistent agents for staying alive

That is the real lesson from the r/openclaw thread.

The problem is not just that OpenClaw can burn tokens in the background.

It is that per-token pricing makes normal agent behavior feel dangerous.

And once you see that clearly, the fixes get a lot more obvious.

Top comments (0)