AI agents are moving into production faster than governance tooling can keep up. Here are the 5 open source tools worth knowing about.
1. Microsoft Agent Governance Toolkit
The 800-pound gorilla. Policy-as-code with Cedar, multi-language SDKs (Python, TypeScript, .NET, Rust, Go), 9,500+ tests. No cryptographic signing but the most mature policy engine.
github.com/microsoft/agent-governance-toolkit
2. asqav
Quantum-safe audit trails. Every agent action gets an ML-DSA-65 signature chained to the previous one. Works with LangChain, CrewAI, OpenAI Agents, Haystack, LiteLLM. The only tool with post-quantum signatures.
github.com/jagmarques/asqav-sdk
3. Guardrails AI
6.6K stars. Output validation and structural guarantees for LLM responses. Guardrails Hub has community validators. Different focus (output quality vs audit trails) but complementary.
github.com/guardrails-ai/guardrails
4. NeMo Guardrails
NVIDIA. Programmable conversation rails using Colang DSL. Topic control, safety rails, jailbreak prevention. Great for chatbot safety, less focused on agent audit trails.
github.com/NVIDIA/NeMo-Guardrails
5. AgentMint
Ed25519 signed receipts with zero dependencies. The init command auto-discovers tool calls in your codebase. Best developer experience for quick setup. No SaaS, fully local.
github.com/aniketh-maddipati/agentmint-python
When to use what
Regulated industry needing long-term proof: asqav (quantum-safe signatures hold up for 10+ years)
Enterprise policy enforcement: Microsoft AGT (most mature, multi-language)
LLM output quality: Guardrails AI
Conversation safety: NeMo Guardrails
Quick local receipts: AgentMint
Full comparison table: github.com/jagmarques/ai-agent-governance-landscape
Top comments (2)
Great roundup. One tool worth adding to this landscape: ThumbGate (github.com/IgorGanapolsky/ThumbGate).
It fills a different niche than the tools listed here — it's specifically focused on pre-action gates for AI coding agents (Claude Code, Cursor, Copilot). Instead of auditing after the fact or validating outputs, it gates destructive operations before they execute.
The hard blocks vs soft steers distinction from your comparison table maps directly to our architecture: hard gates for file deletions and config overwrites, soft gates for less critical operations.
2,478 unique cloners in 14 days, MCP server included. Would be great to see it in the governance landscape comparison.
Great comparison. One layer that's missing from all five tools: what happens after enforcement? Every tool here validates or blocks actions, but none of them accumulate a verifiable behavioral history that has economic value.
We built Nobulex (github.com/arian-gogani/nobulex) to fill that gap. Every agent action produces a bilateral Ed25519 receipt (one signature before execution, one after), hash-chained for tamper evidence. Those receipts accumulate into what we call Trust Capital: a machine reputation that determines what agents are allowed to do. Higher trust = more autonomy, bigger transaction limits, lower insurance premiums.
Think credit scores for AI agents. The enforcement tools on your list are the equivalent of income verification. Trust Capital is the credit bureau that turns verified history into economic access.
Microsoft merged the receipt primitive into their Agent Governance Toolkit (PRs #1302, #1333). Four independent implementations cross-validated. MIT licensed.
Would be interesting to see a "#6: reputation/credit layer" category in a future version of this comparison.