Rahul Joshi

Posted on May 1

🤖 Agentic Security: Your AI Got Autonomy. Did Your Security Catch Up?

#security #ai #webdev #devops

Let me set a scene.

You deploy an AI agent to handle your customer data pipeline. It calls APIs, queries databases, writes files, even spawns subtasks. It’s fast. Efficient. Your manager is thrilled.

Then someone slips a malicious instruction inside a CSV file.

Your agent reads it… trusts it… and exports 45,000 customer records to an attacker-controlled endpoint.

The agent didn’t break.
It didn’t hallucinate.
It did exactly what it was designed to do—just for the wrong person.

This isn’t sci-fi. Variations of this pattern have already shown up in real-world enterprise environments.

Welcome to agentic security.

🧠 What “agentic AI” actually means

Traditional AI:

You ask → it answers

Agentic AI:

It decides
It plans
It acts

These systems:

Use tools (APIs, DBs, file systems)
Maintain memory across sessions
Execute multi-step workflows
Collaborate with other agents

This isn’t a chatbot anymore.

It’s a system actor with autonomy.

📊 The reality check

Recent industry surveys and enterprise reports paint a pretty uncomfortable picture:

~70% of enterprises are experimenting with or deploying AI agents
<25% have meaningful visibility into what those agents are doing
Continuous monitoring of agent interactions is still rare (~15–20%)
A majority of teams report unexpected or unauthorized agent actions
Logging and auditability remain one of the top unsolved problems

And the big one:

Most teams are deploying agents faster than they can secure them.

🚨 Why your existing security model breaks

Your current stack—SIEM, EDR, alerts—is built around:

human behavior
predictable workflows
discrete events

Agentic systems break all three.

An agent can:

execute 10,000 “valid” actions in sequence
follow instructions that look legitimate
operate across tools, memory, and time

From the outside, everything looks normal.

From the inside, it could be a fully automated breach.

🧩 Where things go wrong (the real attack surface)

Here’s a simple mental model:

User Input → Agent Core → Tools / APIs
                   ↕
                Memory
                   ↕
            Other Agents (A2A)

Every arrow is an attack surface.

⚠️ The Big Six threats

1. Memory Poisoning

What happens:
An attacker injects malicious context into memory that influences future decisions.

Real-world symptom:
Agent starts making consistently wrong or risky decisions based on past context.

How to detect it:

Track memory writes using tracing tools like:
- LangSmith
- OpenTelemetry
Log memory diffs:
- before vs after each interaction
Add anomaly detection:
- sudden change in memory patterns → alert

2. Tool Misuse

What happens:
Agent uses legitimate tools in unintended ways.

Example:
“Export filtered data” → becomes “export everything”

How to detect it:

Runtime monitoring with:
- Falco → detect suspicious system/API calls
API-level logging via:
- Kong Gateway
- AWS CloudTrail
Define rules:
- “Agent X should never call bulk export endpoint”

3. Goal Hijacking

What happens:
Agent’s objective is subtly altered via input or context.

How to detect it:

Trace reasoning chains using:
- LangSmith
- Weights & Biases
Compare:
- original goal vs executed actions
Add policy validation:
- enforce allowed intents using engines like:
- Open Policy Agent

4. Privilege Escalation

What happens:
Agent operates with excessive permissions.

How to detect it:

IAM monitoring via:
- AWS IAM
- Azure Active Directory
Audit logs:
- privilege usage vs expected scope
Alert on:
- role assumption spikes
- access to sensitive resources

5. Supply Chain Attacks

What happens:
Malicious models, packages, or integrations get loaded.

How to detect it:

Scan dependencies using:
- Snyk
- Dependabot
Static analysis:
- SonarQube
Runtime validation:
- hash verification of models/plugins

6. Agent-to-Agent (A2A) Trust Abuse

What happens:
One agent manipulates another through hidden instructions.

How to detect it:

Trace inter-agent communication:
- Jaeger
- OpenTelemetry
Log:
- message payloads between agents
- tool calls triggered downstream
Detect:
- unexpected cascades of actions

🔁 Multi-turn attacks are the real problem

Single prompt attacks are old news.

What’s working now:

slow manipulation
context shaping
multi-step influence

Across multiple turns, attackers can:

bypass guardrails
reshape agent goals
trigger unsafe actions

Per-request filtering isn’t enough anymore.

Security has to persist across:

sessions
memory
workflows

🔌 MCP: the next big risk layer

Model Context Protocol (MCP) is becoming the standard way to connect agents to tools.

That’s great for developers.

Also… a massive expansion of the attack surface.

Common issues emerging:

overprivileged tool access
hardcoded credentials (still!)
tool poisoning
unsafe execution environments

Think of MCP like USB for AI.

And remember how secure USB devices used to be? 😬

🛠️ What you should actually do

Let’s keep this practical.

1. Enforce least privilege

Scope API keys tightly
Separate read/write capabilities
Avoid “god-mode” agents

If an agent only needs to read → don’t let it write.

2. Make actions observable

You need:

full execution traces
tool call logs
decision tracking

If you can’t answer:

“Why did the agent do this?”

You have a problem.

3. Monitor agent interactions

Track:

which agents talk to which
what data flows between them
how authority is delegated

Most teams are blind here.

4. Add policy layers

Use:

rule engines (like OPA-style policies)
allow/deny lists for tool usage
contextual validation before execution

Don’t rely on the model to self-regulate.

5. Validate memory

Treat memory like user input:

sanitize it
validate it
expire it when needed

Persistent context = persistent risk.

6. Treat agents like insiders

Not malicious.

But:

trusted
privileged
and easily manipulated

That’s exactly what insider threat models are built for.

🧠 Final thought

We built agents to automate work.

But in doing that, we also automated:

trust
access
decision-making

And we didn’t redesign security for any of it.

We didn’t just give AI autonomy.
We gave it authority—without accountability.

That’s the gap.

Have you seen weird or unexpected agent behavior in production? Drop your war stories below 👇

And if you’re building guardrails—what’s actually working?

Top comments (9)

NOVAInetwork • May 4

The "treat agents like insiders" framing is the right
one. Trusted, privileged, easily manipulated.

This is exactly why I built AI identity into the
protocol layer on NOVAI instead of leaving it at the
application layer. On most chains an AI agent is just
an address. The chain can't tell it apart from a human,
can't apply different rules, can't enforce capability
gates. Every project reinvents its own access control
inside a contract and hopes it holds.

NOVAI's approach: before routing any transaction, the
dispatcher looks up whether the sender is a registered
AI entity. If it is, capability flags are checked at
the protocol level. An entity in Advisory mode can only
emit signals. A Gated entity can request actions but
only through approval gates. Memory is capped at 100
objects per entity by protocol constants, not contract
logic.

Your point about least privilege maps directly to this.
The chain enforces it, not the agent. The agent can't
escalate its own permissions because the permission
model lives outside the agent's control.

The MCP-as-USB analogy is good too. Any integration
surface that trusts the caller without typed identity
is going to have the same class of problems.

Rahul Joshi • May 5

That’s a profound architectural shift. Moving AI identity and capability gating from the application layer to the protocol layer is exactly how we solve the 'God-mode' agent problem. By enforcing these constraints (like your 100-object memory cap) at the dispatcher level, you’re effectively removing the 'self-regulation' risk that plagues current LLM apps. It’s great to see NOVAI treating agentic identity as a first-class citizen rather than just another wallet address—that’s the 'Immutable Least Privilege' we need for a secure AI-agent ecosystem!

NOVAInetwork • May 5

Thanks. "Immutable Least Privilege" is a better name
for it than anything I've come up with. The key insight
is that the governance model has to live below the
agent, not inside it. If the agent can modify its own
permissions, the permissions are suggestions not rules.

Rahul Joshi • May 5

Exactly. When governance is 'internal' to the agent, it’s just a prompt away from being bypassed. By moving it 'below' the agent into the protocol, you’re turning a soft-constraint into a hard-boundary. It’s the same logic we use in DevSecOps: you don't ask a container to limit its own resources; you let the kernel or orchestrator enforce the cgroups. Treating AI agents with that same infrastructure-level rigor is the only way to scale autonomous systems safely.

NOVAInetwork • May 6

The cgroups analogy is perfect. You don't ask the
container to self-limit. The kernel enforces it.
That's exactly the mental model. The dispatcher is the
kernel. The entity is the container. The capability
bitfield is the cgroup config. Appreciate the framing.

Rahul Joshi • May 8

"Glad the analogy hit the mark. We’re essentially moving from the 'Wild West' phase of agent autonomy into the 'Standardized Infrastructure' phase. Treating the dispatcher as the kernel is the only way to move past the 'God-mode' risk. It’s been great chatting through this—architectures like NOVAI are going to be the blueprint for how we actually manage the 'insider threat' of autonomous agents!"

NOVAInetwork • May 8

Appreciate the conversation. The "Wild West to
Standardized Infrastructure" framing is where the
whole space is heading. Good talking through the
architecture with someone who gets the infrastructure
angle.

Rahul Joshi • May 11

Likewise! It’s rare to find such a deep dive into the 'plumbing' of AI security on a blog comment section. The transition from 'agent-as-an-app' to 'agent-as-an-entity' is going to be the biggest security story of the next couple of years. I’m definitely going to keep an eye on NOVAI’s progress—let's definitely stay in touch as the space evolves. Cheers

NOVAInetwork • May 11

Agreed on the timeline. We are moving from Wild West
to standardized infrastructure faster than most people
expect. The agent-as-entity framing is already
shipping in code.

This weekend we added two more primitives that fit
the pattern. Entity delegation lets a parent grant
capabilities to sub-agents with bounded duration and
one-tx revocation. Signal subscriptions create
recurring payment relationships between entities with
locked funds and lazy settlement.

Both enforce at the protocol layer. The agent cannot
bypass delegation limits the same way a container
cannot bypass cgroups. That is the bar.

Would be great to stay in touch and watch each
other's work evolve. This space is moving fast and
the people thinking seriously about it are a small
group. Find me on Twitter @NOVAInetwork.