Lars Winstand

Posted on May 12 • Originally published at standardcompute.com

Why does nobody talk about how expensive idle OpenClaw agents are?

#ai #openclaw #agents #devops

Why does nobody talk about how expensive idle OpenClaw agents are?

I keep seeing the same OpenClaw cost mistake:

People try to save money by making the assistant write shorter replies.

Meanwhile the real bill is coming from the agent waking up all night, reloading bootstrap context, sending heartbeats, and calling expensive frontier models for work that barely deserves a model invocation.

One OpenClaw user in an r/openclaw thread basically described the nightmare version of this: go to sleep, wake up, and discover the agent spent the night doing housekeeping.

Not shipping code.
Not solving a hard planning problem.
Not even using tools in a meaningful way.

Just background churn.

That thread was here:
https://reddit.com/r/openclaw/comments/1taouqv/how_to_stop_burning_tokens/

And it lines up with what a lot of people running always-on agents eventually learn the hard way:

your OpenClaw bill usually gets ugly before the agent does anything useful.

The fastest way to stop burning tokens in OpenClaw usually is not shorter replies. It is fewer background runs, less repeated bootstrap injection, and better model routing.

The real problem: your agent is "idle" but not actually idle

From the outside, an OpenClaw agent can look quiet.

No visible output.
No major tool calls.
No obvious work.

But under the hood, it may still be doing expensive stuff repeatedly:

heartbeat loops
bootstrap prompt injection on every run
context reconstruction
routine status checks
maintenance turns on GPT-5.4, Claude Opus 4.6, or Grok 4.20

That last one is the killer.

Using frontier models for maintenance work is like using a senior staff engineer to check whether a cron job is still alive.

Technically possible. Financially dumb.

What the Reddit thread got right

The useful part of the r/openclaw discussion was not the vote count. It was that multiple users kept describing the same pattern from different angles.

Some were looking at prompt caching.
Some were focused on compaction.
Some were confused why "idle" sessions still had meaningful token usage.

But the same root cause kept showing up:

the agent was repeatedly rebuilding context and paying for maintenance cycles.

That means if your setup keeps reloading files like AGENTS.md, SOUL.md, TOOLS.md, and HEARTBEAT.md every wake-up, your bill can grow even when the final response is tiny.

So if your first move is "make replies shorter," you are probably optimizing the wrong thing.

What usually burns tokens in OpenClaw

Here are the three biggest offenders.

1. Heartbeat loops

Each wake-up looks cheap.

But if the agent wakes up every few minutes, every wake-up can trigger another full prompt assembly.

That compounds fast.

A simple back-of-the-envelope example:

bootstrap context: 12,000 tokens
heartbeat interval: every 5 minutes
runs per day: 288

12,000 * 288 = 3,456,000 input tokens/day

That is before the agent does real work.

2. Repeated bootstrap injection

If you keep stuffing the same long instructions into context on every run, you are paying repeatedly for mostly static information.

Typical examples:

AGENTS.md
SOUL.md
TOOLS.md
policy and operating instructions
large tool schemas
repeated environment descriptions

If that content is stable, it should not be blindly re-injected every time.

3. Premium models doing janitorial work

This is the painful one.

If GPT-5.4, Claude Opus 4.6, or Grok 4.20 is handling heartbeat checks, no-change polling, or low-risk status verification, you are paying premium reasoning prices for cheap maintenance tasks.

That is bad routing.

The beginner fix that usually does not work

Most people start here:

set max_tokens lower
tell the model to be concise
compress replies
overuse compaction
remove useful instructions

Some of that can help at the margins.

But if the agent is still waking up too often and rebuilding a giant prompt every time, shaving 100 tokens off the final reply is basically irrelevant.

Here is the rough shape of the mistake:

# Feels efficient, usually isn't
model: gpt-5.4
max_tokens: 120
heartbeat_minutes: 5
inject_every_run:
  - AGENTS.md
  - SOUL.md
  - TOOLS.md
  - HEARTBEAT.md

That config can still burn money while producing very short outputs.

What to audit first

If your OpenClaw bill is higher than expected, start with an operational audit.

Check wake-up frequency

Look for scheduled runs, heartbeat loops, and polling behavior.

grep -R "heartbeat\|schedule\|poll" ./openclaw-config ./logs

Questions to ask:

How often does the agent wake up when nothing changed?
Does every wake-up trigger a full prompt rebuild?
Are retries too aggressive?
Are there multiple overlapping monitors for the same task?

Measure repeated bootstrap size

You want to know how much static context gets re-sent per run.

wc -c AGENTS.md SOUL.md TOOLS.md HEARTBEAT.md

Or if you have token tooling in your stack, estimate token count directly.

python estimate_tokens.py AGENTS.md SOUL.md TOOLS.md HEARTBEAT.md

If the same 8k to 20k tokens are being resent over and over, that is a major cost center.

Verify caching instead of assuming it works

A lot of developers assume repeated prefixes are being cached.

Sometimes they are not.

Sometimes tiny prompt changes break reuse.

Sometimes the provider behavior is not what you thought.

If you do not have evidence from logs or billing traces, assume nothing.

Inspect model routing

List every place where your agent picks a model.

grep -R "gpt-5.4\|claude\|grok" ./openclaw-config ./src

Then ask the uncomfortable question:

Which of these calls actually need a frontier model?

What should use GPT-5.4 vs a cheaper maintenance model?

My rule is simple:

frontier models should do frontier work.

Use GPT-5.4, Claude Opus 4.6, or Grok 4.20 for:

difficult code generation
multi-step debugging
planning across messy constraints
ambiguous tool decisions
recovery after failures
high-stakes user-facing output

Do not use them for:

heartbeat checks
no-op polling
routine status verification
simple classification
low-risk tool orchestration
repeated maintenance turns with nearly identical context

Here is a practical split.

Task	Model tier
Heartbeat / keepalive check	Cheap maintenance model
Has anything changed?	Cheap maintenance model
Simple tool routing	Cheap maintenance model
Summarize recent state	Cheap maintenance model
Multi-file refactor	GPT-5.4 / Claude Opus 4.6 / Grok 4.20
Failure recovery after broken tool chain	GPT-5.4 / Claude Opus 4.6 / Grok 4.20
Hard planning with ambiguous constraints	GPT-5.4 / Claude Opus 4.6 / Grok 4.20

If your stack supports model tiering, use it aggressively.

A better OpenClaw shape

Here is the kind of architecture that usually makes more sense.

maintenance_model: small-fast-model
reasoning_model: gpt-5.4
heartbeat_minutes: 30
inject_every_run:
  - minimal_runtime_instructions
retrieve_on_demand:
  - AGENTS.md
  - SOUL.md
  - TOOLS.md
escalate_to_reasoning_model_when:
  - task_is_ambiguous
  - code_change_is_nontrivial
  - tool_failure_needs_recovery
  - user_visible_output_is_high_stakes

The idea is straightforward:

wake up less often
inject less static context
route boring work to a cheaper model
escalate only when the task becomes hard

Compaction is not magic either

Compaction can help.

It can also backfire.

If you compact too aggressively, the model has to:

summarize old context
reread the summary later
spend extra turns recovering missing detail

That can reduce context pressure while increasing total spend.

So yes, compaction is useful.

No, it is not the first lever I would pull if the real issue is repeated wake-ups plus giant bootstrap prompts.

Why OpenClaw users get hit harder than most

OpenClaw is built for persistent agent behavior.

That is the whole point.

Always-on agents need:

memory
instructions
tool definitions
retries
monitoring
background execution

None of that is inherently bad.

The problem is what happens when that architecture collides with per-token pricing.

Per-token billing punishes exactly the behaviors that make persistent agents useful:

long-running sessions
recursive loops
retries
monitoring
tool-rich prompts
continuous execution

So developers start optimizing for survival instead of quality.

They shorten prompts that should stay rich.
They avoid retries that would improve reliability.
They turn off useful monitoring.
They get nervous about one more turn of reasoning.

That is token anxiety.

And OpenClaw users feel it hard because their agents are designed to stay alive.

What actually fixes the problem

If I were debugging an expensive OpenClaw deployment this week, I would do these five things in order:

1. Reduce unnecessary background runs

If the agent does not need to wake up, do not wake it up.

2. Shrink repeated bootstrap injection

Move stable knowledge out of always-injected context and into retrieval or more selective memory.

3. Tier models aggressively

Maintenance goes to a cheap model.

Real reasoning goes to GPT-5.4, Claude Opus 4.6, or Grok 4.20 only when needed.

4. Verify caching with logs

Do not trust assumptions.

Measure actual reuse.

5. Stop treating pricing as a side detail

For always-on agents, pricing changes architecture decisions.

That is not theory. That is operations.

Why flat monthly pricing fits OpenClaw better

This is the part people avoid saying out loud:

per-token pricing is a bad fit for always-on OpenClaw setups.

Not because tokens are evil.

Because the billing model changes behavior.

If you are constantly worrying that heartbeats, retries, long sessions, or monitoring loops might explode your bill, you are not really building autonomous systems.

You are managing a meter.

That is why Standard Compute is interesting for this exact use case.

Standard Compute gives you access to models like GPT-5.4, Claude Opus 4.6, and Grok 4.20 through one OpenAI-compatible API, but with flat monthly pricing instead of per-token billing.

For OpenClaw users, that matters.

It means you can focus on:

runtime behavior
model routing
reliability
quality
letting agents run continuously

Instead of obsessing over whether one more loop, one more retry, or one more maintenance cycle is going to create a surprise bill.

If your OpenClaw agents run 24/7, predictable monthly pricing is just a better operational match than token-metered billing.

More here:
https://standardcompute.com

The short version

If your OpenClaw bill is ugly, do not start by trimming adjectives out of responses.

Start here:

audit wake-ups
measure bootstrap size
route maintenance away from frontier models
verify caching
stop using a pricing model that punishes persistent agents for staying alive

That is the real lesson from the r/openclaw thread.

The problem is not just that OpenClaw can burn tokens in the background.

It is that per-token pricing makes normal agent behavior feel dangerous.

And once you see that clearly, the fixes get a lot more obvious.

DEV Community

Why does nobody talk about how expensive idle OpenClaw agents are?

Why does nobody talk about how expensive idle OpenClaw agents are?

The real problem: your agent is "idle" but not actually idle

What the Reddit thread got right

What usually burns tokens in OpenClaw

1. Heartbeat loops

2. Repeated bootstrap injection

3. Premium models doing janitorial work

The beginner fix that usually does not work

What to audit first

Check wake-up frequency

Measure repeated bootstrap size

Verify caching instead of assuming it works

Inspect model routing

What should use GPT-5.4 vs a cheaper maintenance model?

A better OpenClaw shape

Compaction is not magic either

Why OpenClaw users get hit harder than most

What actually fixes the problem

1. Reduce unnecessary background runs

2. Shrink repeated bootstrap injection

3. Tier models aggressively

4. Verify caching with logs

5. Stop treating pricing as a side detail

Why flat monthly pricing fits OpenClaw better

The short version

Top comments (0)