It's not gaslighting — it's accounting

Providers bill tokens, not "how clever your last message was." Everything the model reads on a turn counts toward input: the system stack, prior messages, tool calls, tool results, attachments, compaction artifacts, and headers you never see in the thread.

OpenClaw documents this explicitly in its Token use & costs guide. It also notes that for most OpenAI-style models, English text averages roughly four characters per token as a mental model — real tokenizer counts vary by model, but it's a useful starting point.

Bucket 1: System prompts (the iceberg)

OpenClaw assembles its own system prompt on every run. It's not a short static paragraph. The documented ingredients include the tool list and descriptions, skills metadata, self-update instructions, workspace bootstrap, time and reply behaviour, heartbeat guidance, runtime metadata, and safety sections. See Token use & costs and System Prompt.

Bootstrap files can dwarf your chat

Files like AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, and optional MEMORY.md are injected into context with defaults of up to 20,000 characters per file and 150,000 characters total across all bootstrap content. That's potentially tens of thousands of tokens before the user types a single word.

OpenClaw specifically warns that MEMORY.md grows over time and drives higher context use and more frequent compaction. If your "token usage high" problem appeared after months of use, start by checking the size of your MEMORY.md.

Bucket 2: Tool calls and tool definitions

This is the bucket that shocks people who only watch chat bubbles.

OpenClaw states that tool definitions (JSON schemas) are billed separately from prose, and that for tool-heavy setups they often dominate input tokens. Past tool calls and tool results stay in the transcript — so one oversized HTML dump or log file taxes every subsequent turn.

If you're debugging "why is prompt_tokens huge on turn one," it's usually the combined weight of system + tools, not your opening message.

OpenClaw also ships optional tool-loop detection to protect against runaway spend when agents repeat tools without making progress.

Bucket 3: Memory injection

Injected memory: When MEMORY.md exists, it participates in bootstrap injection on every turn. Daily memory/*.md files are not auto-injected — they only cost tokens when the model actively uses memory_search or memory_get. Details in Memory and System Prompt.

Silent maintenance turns: Near auto-compaction, OpenClaw can run an automatic memory flush turn — often with no visible reply to the user — that is still a full model pass. Budget for that.

Embeddings: If you use vector memory with remote embedding providers, that adds API spend on top of your chat completions.

Bucket 4: Channel overhead (WhatsApp and others)

Channels don't multiply tokens by a fixed factor — but they add structure and history you might not notice.

WhatsApp: quotes, media placeholders, and group buffers

OpenClaw's WhatsApp channel docs explain several common "one short ping, giant context" situations:

  • Quoted replies wrap prior text in a [Replying to …] … [/Replying] block — habitual quoting inflates input significantly.
  • Media-only inbound messages normalise to placeholders like <media:image>; location and contacts become text.
  • Groups: Up to 50 buffered messages (default channels.whatsapp.historyLimit) can be injected when the bot is triggered, wrapped in [Chat messages since your last reply - for context] markers. Set it to 0 to disable injection in noisy groups.

OpenClaw also notes that only final replies should go to WhatsApp and Telegram while streaming may appear on internal UIs — useful context when reconciling what you see with what gets billed.

How to get real data from your gateway

Stop guessing. OpenClaw gives you the tools to measure this directly:

  • /status — shows session model, context usage, last response tokens, and estimated dollar cost when using API-key auth. (Token use & costs)
  • /usage off|tokens|full — per-response footers showing token counts per turn.
  • /context list and /context detail — see exactly how much system, tools, skills, and injected files are contributing. (Context)
  • Logging — model.usage metrics include tokens, cost, duration, provider, model, channel, and session identifiers. (Logging)

Run a controlled experiment: same prompt with tools on vs off, or a direct message vs a busy WhatsApp group. Compare /context detail and /usage full. That's the only "real data" that ends arguments.

10 practical fixes that cut waste by ~40–60%

Operators who stack several of these commonly report roughly half to two-thirds less avoidable spend. OpenClaw doesn't certify a single number — your results depend on your setup.

  1. /compact long sessions so stale history stops riding into every turn. (Token use & costs)
  2. Truncate or summarise tool output before it lands back in the thread. One big HTML page in a tool result taxes every subsequent message.
  3. Lower imageMaxDimensionPx (default 1200) if screenshots are dominating your vision token costs.
  4. Shorten skill descriptions — the injected skills list has deterministic overhead per skill. (Skills)
  5. Use smaller models for exploratory work — keep the premium model for final passes, not every intermediate step.
  6. Tune heartbeats — the default 30-minute full turns add up. Set to 0m to disable if you don't need them. (Personal Assistant Setup)
  7. Use sub-agents for grunt work — each has its own context bill; offload cheap tasks to cheaper models. (Sub-Agents)
  8. Review WhatsApp group historyLimit — default is 50 buffered messages; set to 0 to disable injection in noisy groups. (WhatsApp)
  9. Prompt caching hygiene — align heartbeat intervals and cache TTL/pruning if you rely on provider-side caching. (Prompt Caching)
  10. Enable tool-loop detection — prevents agents from spinning on repeated tool calls and burning your budget. (Tool-loop detection)

Closing

OpenClaw is powerful because it carries agency: tools, memory, channels, and automation. That power shows up on the meter. The fix isn't to use it less — it's to measure first (/context, /usage), then make surgical trims to the buckets above.

Or skip token management entirely — CloudyBot optimizes this for you. We built a hosted layer with predictable hard caps so teams can focus on outcomes, not tokenizer archaeology. Start free when you're ready.

Further reading

Related reading

Ready to automate this? CloudyBot can handle tasks like this on a schedule — with a real browser, memory, and WhatsApp delivery.

Try CloudyBot free →

Free: 30 AI Tasks/month, no card required