The Visibility Problem
Running an AI agent in production means dealing with a problem most developers hit quickly.
The agent makes 15–20 LLM calls per session — chained, conditional, sometimes parallel. Something goes wrong. The output is bad, the cost spiked, or the agent looped. And there's no answer to any of these questions:
- Which specific call failed?
- What did the model actually receive?
- What did it return?
- How much did this session cost?
- Where in the run did it break?
Why Existing Tools Don't Solve It
LangSmith only works if you're using LangChain. Custom agents are unsupported.
Helicone proxies individual LLM API calls. Useful for per-request cost tracking, but it has no concept of agent structure — no parent/child spans, no session grouping, no multi-step trace.
Langfuse is the closest alternative but requires meaningful code instrumentation to get meaningful traces.
Datadog is built for enterprise infrastructure teams, not a developer running their first production agent.
The AgentLens Approach
AgentLens is an open-source observability platform built specifically for AI agent runs.
Option 1: Zero code changes (proxy)
# Before
OPENAI_BASE_URL=https://api.openai.com
# After — one change, full observability
OPENAI_BASE_URL=http://localhost:8090/v1/p/{projectId}/openai
Every LLM call flows through AgentLens. It forwards to OpenAI transparently and captures the full trace — tokens, cost, latency, model, full prompt and completion. Works with any language and any framework.
Option 2: TypeScript SDK
import '@farzanhossans/agentlens-openai'
// auto-patches the OpenAI SDK — every call is traced
Option 3: Python SDK
import agentlens.patchers.openai
# same — all calls auto-traced
Self-Host in 3 Minutes
git clone https://github.com/farzanhossan/agentlens
cd agentlens/infra
cp .env.prod.example .env
docker compose -f docker-compose.prod.yml up -d
Dashboard at localhost:4021. API at localhost:4020.
The Stack
- NestJS + BullMQ — async span processor
- Cloudflare Workers — edge ingest endpoint
- Elasticsearch — trace storage, full-text search, error clustering
- PostgreSQL — metadata, users, projects, alerts
- React dashboard — real-time updates via WebSocket
What's Next
Phase 2 is the AI intelligence layer — using Claude API to automatically analyze traces, explain why agent conversations fail, and surface prompt improvement suggestions. The shift from "see what happened" to "understand why."
Try It
Landing: https://agentlens.techmatbd.com
GitHub: https://github.com/farzanhossan/agentlens
MIT licensed.
Top comments (1)
This is a useful framing. I like the point that per-request LLM logging is not enough once the unit of failure becomes the whole agent run.
One thing I would add to the model is a compact run record next to the trace: run id, parent/child step ids, tool registry snapshot, approval state if any, artifacts produced, and final outcome. Traces answer “what was slow or failed?” The run record answers “what did this agent actually do, and what changed from the last good run?”
That distinction gets especially important when the agent starts calling tools or MCP servers, because the tool boundary is where cost, state, and side effects show up.