Let me set a scene.
You deploy an AI agent to handle your customer data pipeline. It calls APIs, queries databases, writes files, even spawns subtasks. Itās fast. Efficient. Your manager is thrilled.
Then someone slips a malicious instruction inside a CSV file.
Your agent reads it⦠trusts it⦠and exports 45,000 customer records to an attacker-controlled endpoint.
The agent didnāt break.
It didnāt hallucinate.
It did exactly what it was designed to doājust for the wrong person.
This isnāt sci-fi. Variations of this pattern have already shown up in real-world enterprise environments.
Welcome to agentic security.
š§ What āagentic AIā actually means
Traditional AI:
- You ask ā it answers
Agentic AI:
- It decides
- It plans
- It acts
These systems:
- Use tools (APIs, DBs, file systems)
- Maintain memory across sessions
- Execute multi-step workflows
- Collaborate with other agents
This isnāt a chatbot anymore.
Itās a system actor with autonomy.
š The reality check
Recent industry surveys and enterprise reports paint a pretty uncomfortable picture:
- ~70% of enterprises are experimenting with or deploying AI agents
- <25% have meaningful visibility into what those agents are doing
- Continuous monitoring of agent interactions is still rare (~15ā20%)
- A majority of teams report unexpected or unauthorized agent actions
- Logging and auditability remain one of the top unsolved problems
And the big one:
Most teams are deploying agents faster than they can secure them.
šØ Why your existing security model breaks
Your current stackāSIEM, EDR, alertsāis built around:
- human behavior
- predictable workflows
- discrete events
Agentic systems break all three.
An agent can:
- execute 10,000 āvalidā actions in sequence
- follow instructions that look legitimate
- operate across tools, memory, and time
From the outside, everything looks normal.
From the inside, it could be a fully automated breach.
š§© Where things go wrong (the real attack surface)
Hereās a simple mental model:
User Input ā Agent Core ā Tools / APIs
ā
Memory
ā
Other Agents (A2A)
Every arrow is an attack surface.
ā ļø The Big Six threats
1. Memory Poisoning
What happens:
An attacker injects malicious context into memory that influences future decisions.
Real-world symptom:
Agent starts making consistently wrong or risky decisions based on past context.
How to detect it:
-
Track memory writes using tracing tools like:
- LangSmith
- OpenTelemetry
-
Log memory diffs:
- before vs after each interaction
-
Add anomaly detection:
- sudden change in memory patterns ā alert
2. Tool Misuse
What happens:
Agent uses legitimate tools in unintended ways.
Example:
āExport filtered dataā ā becomes āexport everythingā
How to detect it:
-
Runtime monitoring with:
- Falco ā detect suspicious system/API calls
-
API-level logging via:
- Kong Gateway
- AWS CloudTrail
-
Define rules:
- āAgent X should never call bulk export endpointā
3. Goal Hijacking
What happens:
Agentās objective is subtly altered via input or context.
How to detect it:
- Trace reasoning chains using:
- LangSmith
- Weights & Biases
-
Compare:
- original goal vs executed actions
-
Add policy validation:
- enforce allowed intents using engines like:
- Open Policy Agent
4. Privilege Escalation
What happens:
Agent operates with excessive permissions.
How to detect it:
-
IAM monitoring via:
- AWS IAM
- Azure Active Directory
-
Audit logs:
- privilege usage vs expected scope
-
Alert on:
- role assumption spikes
- access to sensitive resources
5. Supply Chain Attacks
What happens:
Malicious models, packages, or integrations get loaded.
How to detect it:
- Scan dependencies using:
- Snyk
- Dependabot
- Static analysis:
- SonarQube
- Runtime validation:
- hash verification of models/plugins
6. Agent-to-Agent (A2A) Trust Abuse
What happens:
One agent manipulates another through hidden instructions.
How to detect it:
-
Trace inter-agent communication:
- Jaeger
- OpenTelemetry
-
Log:
- message payloads between agents
- tool calls triggered downstream
-
Detect:
- unexpected cascades of actions
š Multi-turn attacks are the real problem
Single prompt attacks are old news.
Whatās working now:
- slow manipulation
- context shaping
- multi-step influence
Across multiple turns, attackers can:
- bypass guardrails
- reshape agent goals
- trigger unsafe actions
Per-request filtering isnāt enough anymore.
Security has to persist across:
- sessions
- memory
- workflows
š MCP: the next big risk layer
Model Context Protocol (MCP) is becoming the standard way to connect agents to tools.
Thatās great for developers.
Also⦠a massive expansion of the attack surface.
Common issues emerging:
- overprivileged tool access
- hardcoded credentials (still!)
- tool poisoning
- unsafe execution environments
Think of MCP like USB for AI.
And remember how secure USB devices used to be? š¬
š ļø What you should actually do
Letās keep this practical.
1. Enforce least privilege
- Scope API keys tightly
- Separate read/write capabilities
- Avoid āgod-modeā agents
If an agent only needs to read ā donāt let it write.
2. Make actions observable
You need:
- full execution traces
- tool call logs
- decision tracking
If you canāt answer:
āWhy did the agent do this?ā
You have a problem.
3. Monitor agent interactions
Track:
- which agents talk to which
- what data flows between them
- how authority is delegated
Most teams are blind here.
4. Add policy layers
Use:
- rule engines (like OPA-style policies)
- allow/deny lists for tool usage
- contextual validation before execution
Donāt rely on the model to self-regulate.
5. Validate memory
Treat memory like user input:
- sanitize it
- validate it
- expire it when needed
Persistent context = persistent risk.
6. Treat agents like insiders
Not malicious.
But:
- trusted
- privileged
- and easily manipulated
Thatās exactly what insider threat models are built for.
š§ Final thought
We built agents to automate work.
But in doing that, we also automated:
- trust
- access
- decision-making
And we didnāt redesign security for any of it.
We didnāt just give AI autonomy.
We gave it authorityāwithout accountability.
Thatās the gap.
Have you seen weird or unexpected agent behavior in production? Drop your war stories below š
And if youāre building guardrailsāwhatās actually working?
Top comments (9)
The "treat agents like insiders" framing is the right
one. Trusted, privileged, easily manipulated.
This is exactly why I built AI identity into the
protocol layer on NOVAI instead of leaving it at the
application layer. On most chains an AI agent is just
an address. The chain can't tell it apart from a human,
can't apply different rules, can't enforce capability
gates. Every project reinvents its own access control
inside a contract and hopes it holds.
NOVAI's approach: before routing any transaction, the
dispatcher looks up whether the sender is a registered
AI entity. If it is, capability flags are checked at
the protocol level. An entity in Advisory mode can only
emit signals. A Gated entity can request actions but
only through approval gates. Memory is capped at 100
objects per entity by protocol constants, not contract
logic.
Your point about least privilege maps directly to this.
The chain enforces it, not the agent. The agent can't
escalate its own permissions because the permission
model lives outside the agent's control.
The MCP-as-USB analogy is good too. Any integration
surface that trusts the caller without typed identity
is going to have the same class of problems.
Thatās a profound architectural shift. Moving AI identity and capability gating from the application layer to the protocol layer is exactly how we solve the 'God-mode' agent problem. By enforcing these constraints (like your 100-object memory cap) at the dispatcher level, youāre effectively removing the 'self-regulation' risk that plagues current LLM apps. Itās great to see NOVAI treating agentic identity as a first-class citizen rather than just another wallet addressāthatās the 'Immutable Least Privilege' we need for a secure AI-agent ecosystem!
Thanks. "Immutable Least Privilege" is a better name
for it than anything I've come up with. The key insight
is that the governance model has to live below the
agent, not inside it. If the agent can modify its own
permissions, the permissions are suggestions not rules.
Exactly. When governance is 'internal' to the agent, itās just a prompt away from being bypassed. By moving it 'below' the agent into the protocol, youāre turning a soft-constraint into a hard-boundary. Itās the same logic we use in DevSecOps: you don't ask a container to limit its own resources; you let the kernel or orchestrator enforce the cgroups. Treating AI agents with that same infrastructure-level rigor is the only way to scale autonomous systems safely.
The cgroups analogy is perfect. You don't ask the
container to self-limit. The kernel enforces it.
That's exactly the mental model. The dispatcher is the
kernel. The entity is the container. The capability
bitfield is the cgroup config. Appreciate the framing.
"Glad the analogy hit the mark. Weāre essentially moving from the 'Wild West' phase of agent autonomy into the 'Standardized Infrastructure' phase. Treating the dispatcher as the kernel is the only way to move past the 'God-mode' risk. Itās been great chatting through thisāarchitectures like NOVAI are going to be the blueprint for how we actually manage the 'insider threat' of autonomous agents!"
Appreciate the conversation. The "Wild West to
Standardized Infrastructure" framing is where the
whole space is heading. Good talking through the
architecture with someone who gets the infrastructure
angle.
Likewise! Itās rare to find such a deep dive into the 'plumbing' of AI security on a blog comment section. The transition from 'agent-as-an-app' to 'agent-as-an-entity' is going to be the biggest security story of the next couple of years. Iām definitely going to keep an eye on NOVAIās progressālet's definitely stay in touch as the space evolves. Cheers
Agreed on the timeline. We are moving from Wild West
to standardized infrastructure faster than most people
expect. The agent-as-entity framing is already
shipping in code.
This weekend we added two more primitives that fit
the pattern. Entity delegation lets a parent grant
capabilities to sub-agents with bounded duration and
one-tx revocation. Signal subscriptions create
recurring payment relationships between entities with
locked funds and lazy settlement.
Both enforce at the protocol layer. The agent cannot
bypass delegation limits the same way a container
cannot bypass cgroups. That is the bar.
Would be great to stay in touch and watch each
other's work evolve. This space is moving fast and
the people thinking seriously about it are a small
group. Find me on Twitter @NOVAInetwork.