Farzan Hossan Shaikat

Posted on Apr 29

AI Agents in Production Are Flying Blind — AgentLens Fixes That

#agents #ai #llm #monitoring

The Visibility Problem

Running an AI agent in production means dealing with a problem most developers hit quickly.

The agent makes 15–20 LLM calls per session — chained, conditional, sometimes parallel. Something goes wrong. The output is bad, the cost spiked, or the agent looped. And there's no answer to any of these questions:

Which specific call failed?
What did the model actually receive?
What did it return?
How much did this session cost?
Where in the run did it break?

Why Existing Tools Don't Solve It

LangSmith only works if you're using LangChain. Custom agents are unsupported.

Helicone proxies individual LLM API calls. Useful for per-request cost tracking, but it has no concept of agent structure — no parent/child spans, no session grouping, no multi-step trace.

Langfuse is the closest alternative but requires meaningful code instrumentation to get meaningful traces.

Datadog is built for enterprise infrastructure teams, not a developer running their first production agent.

The AgentLens Approach

AgentLens is an open-source observability platform built specifically for AI agent runs.

Option 1: Zero code changes (proxy)

# Before
OPENAI_BASE_URL=https://api.openai.com

# After — one change, full observability
OPENAI_BASE_URL=http://localhost:8090/v1/p/{projectId}/openai

Every LLM call flows through AgentLens. It forwards to OpenAI transparently and captures the full trace — tokens, cost, latency, model, full prompt and completion. Works with any language and any framework.

Option 2: TypeScript SDK

import '@farzanhossans/agentlens-openai'
// auto-patches the OpenAI SDK — every call is traced

Option 3: Python SDK

import agentlens.patchers.openai
# same — all calls auto-traced

Self-Host in 3 Minutes

git clone https://github.com/farzanhossan/agentlens
cd agentlens/infra
cp .env.prod.example .env
docker compose -f docker-compose.prod.yml up -d

Dashboard at localhost:4021. API at localhost:4020.

The Stack

NestJS + BullMQ — async span processor
Cloudflare Workers — edge ingest endpoint
Elasticsearch — trace storage, full-text search, error clustering
PostgreSQL — metadata, users, projects, alerts
React dashboard — real-time updates via WebSocket

What's Next

Phase 2 is the AI intelligence layer — using Claude API to automatically analyze traces, explain why agent conversations fail, and surface prompt improvement suggestions. The shift from "see what happened" to "understand why."

Try It

Landing: https://agentlens.techmatbd.com
GitHub: https://github.com/farzanhossan/agentlens
MIT licensed.

Top comments (1)

Armorer Labs • May 12

This is a useful framing. I like the point that per-request LLM logging is not enough once the unit of failure becomes the whole agent run.

One thing I would add to the model is a compact run record next to the trace: run id, parent/child step ids, tool registry snapshot, approval state if any, artifacts produced, and final outcome. Traces answer “what was slow or failed?” The run record answers “what did this agent actually do, and what changed from the last good run?”

That distinction gets especially important when the agent starts calling tools or MCP servers, because the tool boundary is where cost, state, and side effects show up.