DEV Community

# agents

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
A Postmortem on Autonomous LLM-as-Judge: How My Eval Agent Got Two Verdicts Wrong Before I Found a Sandbox Bug

A Postmortem on Autonomous LLM-as-Judge: How My Eval Agent Got Two Verdicts Wrong Before I Found a Sandbox Bug

Comments
4 min read
Why I'm Building SaaS in 2026

SaaS as reliable plumbing for fragile agents

Why I'm Building SaaS in 2026

46
Comments 28
4 min read
AI Agents Need Permission Boundaries, Not Personalities

AI Agents Need Permission Boundaries, Not Personalities

Comments
6 min read
LLM-as-Judge: using Claude to review a Gemini agent

LLM-as-Judge: using Claude to review a Gemini agent

Comments
7 min read
A Local-First Multi-Agent Dashboard for Codex CLI and Claude Code

A Local-First Multi-Agent Dashboard for Codex CLI and Claude Code

Comments
3 min read
Your AI Doesn't Have a Brain. It Has a Filing Cabinet.

Your AI Doesn't Have a Brain. It Has a Filing Cabinet.

Comments
6 min read
The MCP Evaluation Framework Nobody Talks About (But Should)

The MCP Evaluation Framework Nobody Talks About (But Should)

Comments
6 min read
My AI agent guesses Design tokens repeatedly. MCP doesn't fix it either!

My AI agent guesses Design tokens repeatedly. MCP doesn't fix it either!

7
Comments
1 min read
Building a Voice-Controlled Local AI Agent on a 4GB GPU

Building a Voice-Controlled Local AI Agent on a 4GB GPU

Comments
3 min read
The 3 MCP Servers Every AI Agent Needs in Production

The 3 MCP Servers Every AI Agent Needs in Production

1
Comments
7 min read
Authenticated, Authorized, and Still Unsafe: The Missing Layer in Agent Security

Authenticated, Authorized, and Still Unsafe: The Missing Layer in Agent Security

Comments
5 min read
Building autonomous AI agents is fun. Securing their access in production is a nightmare.

Building autonomous AI agents is fun. Securing their access in production is a nightmare.

Comments
3 min read
Amazon Bedrock AgentCore Harness runs your agent. ShapeV2 controls what it's allowed to do

Missing reasoning audit layer

Amazon Bedrock AgentCore Harness runs your agent. ShapeV2 controls what it's allowed to do

34
Comments 5
5 min read
Everyone Building AI Research Tools Is Solving the Wrong Problem

Everyone Building AI Research Tools Is Solving the Wrong Problem

4
Comments
7 min read
Building an autonomous travel agent: the journey begins

Building an autonomous travel agent: the journey begins

Comments
2 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.