Lars Winstand

Posted on May 12 • Originally published at standardcompute.com

My OpenClaw agent looked idle overnight and still burned through tokens

#ai #agents #devops #openai

I found a small but very real r/openclaw thread recently: 14 upvotes, 29 comments, and a painfully familiar question.

Why did an OpenClaw agent that looked basically idle overnight still torch the budget?

The best answer from the thread was not exotic.

It was heartbeats.

More specifically: heartbeats that keep resending a fat conversation history back through the model.

That means your agent can look asleep while still paying for context replay over and over.

If you run OpenClaw with a long-lived thread, this is probably the first thing to inspect.

The actual failure mode

A lot of people assume token burn comes from obvious work:

generating lots of code
browser automation
long reasoning chains
multi-step planning

Sometimes yes.

But the thread kept converging on a more boring answer: long sessions plus frequent heartbeats.

A simplified loop looks like this:

heartbeat -> send current state + prior conversation -> model responds -> wait -> repeat

If your session is already large, every heartbeat is expensive even when nothing interesting happened.

That means cost tracks context size, not just visible activity.

Why "idle" is not idle

OpenClaw can feel idle from the outside while still doing this internally:

while True:
    payload = {
        "system": system_prompt,
        "messages": full_conversation_history,
        "state": current_agent_state,
    }
    response = call_model(payload)
    maybe_run_tools(response)
    time.sleep(heartbeat_interval)

The problem is obvious once you write it down:

full_conversation_history keeps growing
each heartbeat resends it
cost compounds silently

That is how people wake up to a bill and think, "but the agent barely did anything."

The thread's best fix: stop keeping one immortal session alive

The most useful suggestion in the comments was not "summarize harder."

It was: use short sessions and write a handoff file.

That pattern looks more like this:

run a bounded task
write the minimum state needed for the next task
end the session
start fresh later with only the handoff

Example handoff file:

# handoff.md

Current task: finish deployment validation
Status: staging deploy succeeded
Next step: run smoke test against /health and /login
Known issue: flaky timeout on user-profile endpoint
Files touched:
- deploy.sh
- smoke_test.py

Then the next session can start with a much smaller prompt:

Read handoff.md and continue from there.
Do not reconstruct old discussion unless required.
Current goal: run smoke tests and report failures.

That is less magical than persistent memory.

It is also usually cheaper.

Why aggressive compaction is not a silver bullet

The tempting answer is to summarize the thread every so often.

That can help, but the Reddit discussion was right to be skeptical about doing it constantly mid-task.

Because the real flow becomes:

read the giant thread
generate a summary
read the summary on the next call
discover something important got dropped
spend more tokens recovering context

That is often just a second conversation about the first conversation.

For active coding or debugging work, losing one detail can be enough to trigger expensive repair loops.

Better architecture beats better prompting

The smartest part of the discussion was this: stop making the model do deterministic work.

If a job can be handled by Python, bash, cron, or n8n, do that.

Use the model for judgment, not polling.

Bad pattern

One always-on OpenClaw agent:

checks files
polls APIs
watches logs
decides if anything changed
keeps a giant context alive

Better pattern

Small background jobs do the boring work.

Only wake the agent when there is something worth reasoning about.

Example with cron:

*/15 * * * * /usr/bin/python3 /opt/check_build_status.py
0 * * * * /usr/bin/python3 /opt/check_error_budget.py

Example notifier script:

# check_build_status.py
import json
import subprocess
from pathlib import Path

state_file = Path("/tmp/build_state.json")
current = {"build_failed": False, "new_error": True}
previous = json.loads(state_file.read_text()) if state_file.exists() else {}

if current != previous and current.get("new_error"):
    subprocess.run([
        "python3",
        "wake_openclaw.py",
        "Build state changed. Investigate latest CI failure."
    ])

state_file.write_text(json.dumps(current))

This is a much better use of an LLM:

scripts gather facts cheaply
cron handles repetition cheaply
OpenClaw gets called only when reasoning is needed

Model tiering is the boring answer that actually works

Another good point from the thread: stop paying premium model prices for maintenance tasks.

If your heartbeat, orchestration, or lightweight routing is using Claude Opus or GPT-5 every time, you are probably overspending.

A practical split looks like this:

Task	Better default
Heartbeats / lightweight orchestration	Qwen 9B, DeepSeek, or another cheap model
Normal coding / planning	Claude Sonnet
High-stakes reasoning	Claude Opus or GPT-5
Local compression / summarization	Granite 3B, Llama, or another local model

Example config:

heartbeat_interval: 1h
orchestrator_model: qwen-9b
heavy_reasoning_model: claude-sonnet
mission_critical_model: gpt-5
local_compression_model: granite-3b
session_rule: "end task -> write handoff.md -> start fresh session"

That setup is not glamorous.

It is just sane.

Practical debugging checklist for OpenClaw token burn

If your bill feels wrong, I would check these in order:

1. Measure heartbeat frequency

If your agent is checking in every few minutes, stretch it out.

Start with 1 hour for anything non-urgent.

2. Inspect payload size

Log how much context is being sent per call.

Even rough logging helps.

def estimate_chars(messages):
    return sum(len(m.get("content", "")) for m in messages)

print("context_chars=", estimate_chars(full_conversation_history))

3. Kill long-lived threads

If a task is done, end the session.

Do not keep dragging old context forward just because it feels convenient.

4. Replace repetitive reasoning with scripts

If the model is repeatedly checking a condition, that is probably script territory.

5. Tier your models

Cheap model for maintenance.
Expensive model for hard calls.

6. Use local summarization if you need compression

If you really must compact context, a local model can be a good tradeoff.

A concrete OpenClaw workflow that is harder to bankrupt

Here is a pattern I would recommend for a lot of OpenClaw setups:

cron/script layer
  -> checks deterministic conditions
  -> writes structured state files
  -> wakes OpenClaw only on change

OpenClaw session
  -> reads latest state file + handoff.md
  -> performs bounded reasoning task
  -> writes result + next-step handoff
  -> exits

State file example:

{
  "service": "billing-api",
  "deploy_status": "failed",
  "failed_step": "database migration",
  "timestamp": "2026-05-12T02:10:00Z"
}

Prompt example:

Read state.json and handoff.md.
Determine the most likely cause of the failed migration.
Suggest the smallest safe fix.
Do not re-check unrelated systems.

This keeps the model focused and keeps your context small.

The bigger issue is not just cost, it is behavior

Per-token billing changes how people build.

Once you get burned a couple times, you start doing weird things:

cutting prompts too aggressively
avoiding retries
reducing monitoring
limiting experiments
shutting down useful background loops

That is not just a billing problem.

It changes the architecture of the system because you are designing around fear.

For OpenClaw users running always-on agents, that gets old fast.

This is exactly why Standard Compute is interesting here.

Instead of paying per token and constantly wondering whether a sleepy-looking agent is secretly expensive, you get flat monthly pricing through an OpenAI-compatible API with routing across models like GPT-5, Claude Opus, and Grok.

If you are building persistent OpenClaw automations, that changes the optimization target.

You can optimize for reliability and output quality instead of constantly asking, "is this heartbeat worth the money?"

That does not remove the need for good architecture.

You should still shorten sessions, use handoff files, and move deterministic work into scripts.

But it does remove a lot of the token anxiety that makes people underbuild useful agents.

My take

The main lesson from that Reddit thread was simple:

The problem usually is not that your OpenClaw agent is too smart.

It is that the session shape is wrong.

If you keep one giant thread alive, heartbeats turn context into a recurring tax.

If you switch to:

longer heartbeat intervals
bounded sessions
handoff files
scripts and cron for repetitive work
model tiering

then the bill usually gets more predictable very quickly.

And if you are tired of per-token pricing shaping every design decision, use infrastructure that is built for always-on agent workloads instead of fighting the meter all night.

That is the real fix: better architecture first, better pricing model second.

Both matter.

DEV Community