DEV Community

Cover image for My OpenClaw agent looked idle overnight and still burned through tokens
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

My OpenClaw agent looked idle overnight and still burned through tokens

I found a small but very real r/openclaw thread recently: 14 upvotes, 29 comments, and a painfully familiar question.

Why did an OpenClaw agent that looked basically idle overnight still torch the budget?

The best answer from the thread was not exotic.

It was heartbeats.

More specifically: heartbeats that keep resending a fat conversation history back through the model.

That means your agent can look asleep while still paying for context replay over and over.

If you run OpenClaw with a long-lived thread, this is probably the first thing to inspect.

The actual failure mode

A lot of people assume token burn comes from obvious work:

  • generating lots of code
  • browser automation
  • long reasoning chains
  • multi-step planning

Sometimes yes.

But the thread kept converging on a more boring answer: long sessions plus frequent heartbeats.

A simplified loop looks like this:

heartbeat -> send current state + prior conversation -> model responds -> wait -> repeat
Enter fullscreen mode Exit fullscreen mode

If your session is already large, every heartbeat is expensive even when nothing interesting happened.

That means cost tracks context size, not just visible activity.

Why "idle" is not idle

OpenClaw can feel idle from the outside while still doing this internally:

while True:
    payload = {
        "system": system_prompt,
        "messages": full_conversation_history,
        "state": current_agent_state,
    }
    response = call_model(payload)
    maybe_run_tools(response)
    time.sleep(heartbeat_interval)
Enter fullscreen mode Exit fullscreen mode

The problem is obvious once you write it down:

  • full_conversation_history keeps growing
  • each heartbeat resends it
  • cost compounds silently

That is how people wake up to a bill and think, "but the agent barely did anything."

The thread's best fix: stop keeping one immortal session alive

The most useful suggestion in the comments was not "summarize harder."

It was: use short sessions and write a handoff file.

That pattern looks more like this:

  1. run a bounded task
  2. write the minimum state needed for the next task
  3. end the session
  4. start fresh later with only the handoff

Example handoff file:

# handoff.md

Current task: finish deployment validation
Status: staging deploy succeeded
Next step: run smoke test against /health and /login
Known issue: flaky timeout on user-profile endpoint
Files touched:
- deploy.sh
- smoke_test.py
Enter fullscreen mode Exit fullscreen mode

Then the next session can start with a much smaller prompt:

Read handoff.md and continue from there.
Do not reconstruct old discussion unless required.
Current goal: run smoke tests and report failures.
Enter fullscreen mode Exit fullscreen mode

That is less magical than persistent memory.

It is also usually cheaper.

Why aggressive compaction is not a silver bullet

The tempting answer is to summarize the thread every so often.

That can help, but the Reddit discussion was right to be skeptical about doing it constantly mid-task.

Because the real flow becomes:

  1. read the giant thread
  2. generate a summary
  3. read the summary on the next call
  4. discover something important got dropped
  5. spend more tokens recovering context

That is often just a second conversation about the first conversation.

For active coding or debugging work, losing one detail can be enough to trigger expensive repair loops.

Better architecture beats better prompting

The smartest part of the discussion was this: stop making the model do deterministic work.

If a job can be handled by Python, bash, cron, or n8n, do that.

Use the model for judgment, not polling.

Bad pattern

One always-on OpenClaw agent:

  • checks files
  • polls APIs
  • watches logs
  • decides if anything changed
  • keeps a giant context alive

Better pattern

Small background jobs do the boring work.

Only wake the agent when there is something worth reasoning about.

Example with cron:

*/15 * * * * /usr/bin/python3 /opt/check_build_status.py
0 * * * * /usr/bin/python3 /opt/check_error_budget.py
Enter fullscreen mode Exit fullscreen mode

Example notifier script:

# check_build_status.py
import json
import subprocess
from pathlib import Path

state_file = Path("/tmp/build_state.json")
current = {"build_failed": False, "new_error": True}
previous = json.loads(state_file.read_text()) if state_file.exists() else {}

if current != previous and current.get("new_error"):
    subprocess.run([
        "python3",
        "wake_openclaw.py",
        "Build state changed. Investigate latest CI failure."
    ])

state_file.write_text(json.dumps(current))
Enter fullscreen mode Exit fullscreen mode

This is a much better use of an LLM:

  • scripts gather facts cheaply
  • cron handles repetition cheaply
  • OpenClaw gets called only when reasoning is needed

Model tiering is the boring answer that actually works

Another good point from the thread: stop paying premium model prices for maintenance tasks.

If your heartbeat, orchestration, or lightweight routing is using Claude Opus or GPT-5 every time, you are probably overspending.

A practical split looks like this:

Task Better default
Heartbeats / lightweight orchestration Qwen 9B, DeepSeek, or another cheap model
Normal coding / planning Claude Sonnet
High-stakes reasoning Claude Opus or GPT-5
Local compression / summarization Granite 3B, Llama, or another local model

Example config:

heartbeat_interval: 1h
orchestrator_model: qwen-9b
heavy_reasoning_model: claude-sonnet
mission_critical_model: gpt-5
local_compression_model: granite-3b
session_rule: "end task -> write handoff.md -> start fresh session"
Enter fullscreen mode Exit fullscreen mode

That setup is not glamorous.

It is just sane.

Practical debugging checklist for OpenClaw token burn

If your bill feels wrong, I would check these in order:

1. Measure heartbeat frequency

If your agent is checking in every few minutes, stretch it out.

Start with 1 hour for anything non-urgent.

2. Inspect payload size

Log how much context is being sent per call.

Even rough logging helps.

def estimate_chars(messages):
    return sum(len(m.get("content", "")) for m in messages)

print("context_chars=", estimate_chars(full_conversation_history))
Enter fullscreen mode Exit fullscreen mode

3. Kill long-lived threads

If a task is done, end the session.

Do not keep dragging old context forward just because it feels convenient.

4. Replace repetitive reasoning with scripts

If the model is repeatedly checking a condition, that is probably script territory.

5. Tier your models

Cheap model for maintenance.
Expensive model for hard calls.

6. Use local summarization if you need compression

If you really must compact context, a local model can be a good tradeoff.

A concrete OpenClaw workflow that is harder to bankrupt

Here is a pattern I would recommend for a lot of OpenClaw setups:

cron/script layer
  -> checks deterministic conditions
  -> writes structured state files
  -> wakes OpenClaw only on change

OpenClaw session
  -> reads latest state file + handoff.md
  -> performs bounded reasoning task
  -> writes result + next-step handoff
  -> exits
Enter fullscreen mode Exit fullscreen mode

State file example:

{
  "service": "billing-api",
  "deploy_status": "failed",
  "failed_step": "database migration",
  "timestamp": "2026-05-12T02:10:00Z"
}
Enter fullscreen mode Exit fullscreen mode

Prompt example:

Read state.json and handoff.md.
Determine the most likely cause of the failed migration.
Suggest the smallest safe fix.
Do not re-check unrelated systems.
Enter fullscreen mode Exit fullscreen mode

This keeps the model focused and keeps your context small.

The bigger issue is not just cost, it is behavior

Per-token billing changes how people build.

Once you get burned a couple times, you start doing weird things:

  • cutting prompts too aggressively
  • avoiding retries
  • reducing monitoring
  • limiting experiments
  • shutting down useful background loops

That is not just a billing problem.

It changes the architecture of the system because you are designing around fear.

For OpenClaw users running always-on agents, that gets old fast.

This is exactly why Standard Compute is interesting here.

Instead of paying per token and constantly wondering whether a sleepy-looking agent is secretly expensive, you get flat monthly pricing through an OpenAI-compatible API with routing across models like GPT-5, Claude Opus, and Grok.

If you are building persistent OpenClaw automations, that changes the optimization target.

You can optimize for reliability and output quality instead of constantly asking, "is this heartbeat worth the money?"

That does not remove the need for good architecture.

You should still shorten sessions, use handoff files, and move deterministic work into scripts.

But it does remove a lot of the token anxiety that makes people underbuild useful agents.

My take

The main lesson from that Reddit thread was simple:

The problem usually is not that your OpenClaw agent is too smart.

It is that the session shape is wrong.

If you keep one giant thread alive, heartbeats turn context into a recurring tax.

If you switch to:

  • longer heartbeat intervals
  • bounded sessions
  • handoff files
  • scripts and cron for repetitive work
  • model tiering

then the bill usually gets more predictable very quickly.

And if you are tired of per-token pricing shaping every design decision, use infrastructure that is built for always-on agent workloads instead of fighting the meter all night.

That is the real fix: better architecture first, better pricing model second.

Both matter.

Top comments (0)