DEV Community

Cover image for Built-in Token Counting: Telemetry for Production AI Agents
Elizabeth Fuentes L for AWS

Posted on • Originally published at builder.aws.com

Built-in Token Counting: Telemetry for Production AI Agents

Strands Agents provides native telemetry and cost tracking out of the box. Stop writing custom token counters.

Building AI agents is easy. Deploying them to production is where most teams hit a wall.

One of the first questions from finance: "How much will this cost per request?"

Most agent frameworks make you build your own token counter. Strands Agents gives you one.

The Problem with Custom Token Counting

Every AI application needs cost monitoring. But tracking tokens across:

  • Multiple model calls
  • Tool invocations
  • Prompt caching
  • Multi-agent workflows

...requires custom infrastructure most teams rebuild from scratch.

Native Telemetry in Strands Agents

Strands Agents includes production-grade telemetry by default:

from strands import Agent
from strands_tools import calculator

# Create an agent with tools
agent = Agent(tools=[calculator])

# Invoke the agent with a prompt and get an AgentResult
result = agent("What is the square root of 144?")

# Access metrics through the AgentResult
print(f"Total tokens: {result.metrics.accumulated_usage['totalTokens']}")
print(f"Execution time: {sum(result.metrics.cycle_durations):.2f} seconds")
print(f"Tools used: {list(result.metrics.tool_metrics.keys())}")

# Cache metrics (when available)
if 'cacheReadInputTokens' in result.metrics.accumulated_usage:
    print(f"Cache read tokens: {result.metrics.accumulated_usage['cacheReadInputTokens']}")
if 'cacheWriteInputTokens' in result.metrics.accumulated_usage:
    print(f"Cache write tokens: {result.metrics.accumulated_usage['cacheWriteInputTokens']}")
Enter fullscreen mode Exit fullscreen mode

No configuration. No custom code. It just works.

What You Get

Every AgentResult includes:

Metric Description
inputTokens Tokens sent to the model
outputTokens Tokens generated by the model
totalTokens Total cost (input + output)
cacheReadInputTokens Tokens read from cache (Bedrock prompt caching)
cacheWriteInputTokens Tokens written to cache

Multi-Agent Token Tracking

For multi-agent systems (executor → validator → critic), aggregate metrics across all agents:

from strands.multiagent import Swarm

swarm = Swarm([executor, validator, critic])
result = swarm("Query")

total_tokens = 0
for node_result in result.results.values():
    usage = node_result.result.metrics.accumulated_usage
    total_tokens += usage['totalTokens']

print(f"Total cost across all agents: {total_tokens} tokens")
Enter fullscreen mode Exit fullscreen mode

Per-Cycle Tracking

For agents that run multiple reasoning cycles, track tokens per cycle:

from strands import Agent
from strands_tools import calculator

agent = Agent(tools=[calculator])

# First invocation
result1 = agent("What is 5 + 3?")

# Second invocation
result2 = agent("What is the square root of 144?")

# Access metrics for the latest invocation
latest_invocation = result2.metrics.latest_agent_invocation
cycles = latest_invocation.cycles
usage = latest_invocation.usage

# Or access all invocations
for invocation in response.metrics.agent_invocations:
    print(f"Invocation usage: {invocation.usage}")
    for cycle in invocation.cycles:
        print(f"  Cycle {cycle.event_loop_cycle_id}: {cycle.usage}")

# Or print the summary (includes all invocations)
print(result2.metrics.get_summary())
Enter fullscreen mode Exit fullscreen mode

For a complete list of attributes and their types, see the EventLoopMetrics API reference.

Why This Matters

Cost visibility is the difference between a prototype and production AI.

With Strands telemetry:

  • ✅ Budget AI workloads before deployment
  • ✅ Identify expensive queries in production
  • ✅ Optimize prompts with real token data
  • ✅ Track prompt caching savings

All without writing a single line of telemetry code.

Works with All Model Providers

Token tracking works regardless of your model provider:

  • Amazon Bedrock (Claude, Llama, Mistral)
  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic API
  • Ollama (local models)

Same API, same metrics, zero config changes.

Try It

pip install strands-agents
Enter fullscreen mode Exit fullscreen mode

Full documentation: strandsagents.com/docs/user-guide/concepts/agents/


Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

Top comments (0)