Navayuvan SB

Posted on May 11 • Originally published at blogs.navayuvan.com

Three Layers of Tool Call Hardening for AI Agents

#agents #ai #software #llm

In current software engineering,We're building a lot of AI Agents on our products right now. And having an AI agent in your product is how you keep your product alive, right? That's how the world is moving.

And while everyone is busy building AI agents — tweaking prompts, giving tool calls, focusing on model choice and parameters — there is one critical area most developers sometimes skip.

Tool harness and security.

Not the prompt. Not the model. The harness around your tools — how you design them, constrain them, and control what the agent can actually do with them.

And skipping this will cost you a lot in terms of both security and reliability.

What Even Is Tool Harness?

When you give an AI agent a tool, you're not just giving it a function. You're giving it a boundary. A set of rules about what it can touch, what it can't, and how it should behave when it acts.

Most of us don't think about it that way. We write the tool, attach it to the agent, and move on. The harness — the constraints, the access controls, the behavioral guardrails — gets left to the prompt.

That's the mistake.

Prompts can be overridden. Prompts can be manipulated. Prompts can be ignored. The harness needs to live at the code level, the execution level, the architecture level.

And here's how to build it properly. There are three layers.

Layer 1: Strip Identity Params — Inject Them Server-Side

The first layer is about access control. And it starts with your tool schema.

Let's say you're building a to-do app with an AI agent. You give it a list_tasks tool. Your schema looks like this:

{
  "name": "list_tasks",
  "parameters": {
    "user_id": "string",
    "filters": {
      "status": "string",
      "due_before": "string"
    }
  }
}

Looks fine, right?

It's not.

Because user_id is in the schema, the agent can pass any user ID it wants. A malicious prompt, a confused model, a prompt injection — any of these could have your agent fetching data it has absolutely no business touching. There's no authentication. There's no authorization.

The fix: strip all identity params from the schema. Things like user_id, account_id, workspace_id, knowledge_base_id — these define the scope of who sees what. The agent doesn't get to decide scope. You do.

{
  "name": "list_tasks",
  "parameters": {
    "filters": {
      "status": "string",
      "due_before": "string"
    }
  }
}

And when the tool executes, inject the identity yourself — from the authenticated session:

async function list_tasks(params: { filters: Filters }, session: Session) {
  const userId = session.userId; // you control this, not the agent
  return db.tasks.findMany({
    where: {
      userId,
      ...params.filters,
    },
  });
}

The agent says what it needs. You decide whose data gets touched. That's the harness. 💡

Layer 2: Enforce Behavioral Constraints at the Code Level

The second layer is about how your tools behave — not just what they can access.

If you've used Claude Code, you'd have seen this error:

"A file cannot be written before it has been read."

That's not a prompt instruction. That's a hard constraint baked into the tool itself. The developers at Claude Code took a very human behavior — open the file, read it, understand it, then edit it — and enforced it at the execution level.

That's exactly what we need to do with our own tools.

For example, if you have an update_task tool, don't let the agent call it cold. Enforce a read-first constraint at the code level:

async function update_task(params: UpdateTaskParams, session: Session) {
  const lastRead = await cache.get(`task_read:${params.task_id}:${session.userId}`);

  if (!lastRead || Date.now() - lastRead > 60_000) {
    throw new Error(
      "Task must be read before it can be updated. Call get_task first."
    );
  }

  return db.tasks.update({
    where: { id: params.task_id },
    data: params.updates,
  });
}

You can mention this in the prompt too — but the check has to live in code. Not just in a system prompt the model might miss or ignore. The execution layer is where the harness lives. 🔒

Layer 3: Pre-flight Validation with a Reasoning Agent

This one is more advanced. I haven't shipped it in my own product yet — but I know this will work.

The idea: before any tool call executes, require the agent to pass a reason — a short explanation of why it's calling that tool.

{
  "name": "list_tasks",
  "parameters": {
    "reason": "string",
    "filters": {
      "status": "string",
      "due_before": "string"
    }
  }
}

This forces the agent to think before it acts. It might actually realize the reason isn't valid and decide not to call the tool at all.

And you can take it further — spin up a lightweight validation agent running on a small, fast model that takes the tool name, the reason, and the conversation context, and decides whether the call is actually justified:

async function validateToolCall(toolName: string, reason: string, context: string) {
  const response = await llm.complete({
    model: "fast-small-model",
    prompt: `
      Tool requested: ${toolName}
      Reason given: ${reason}
      Conversation context: ${context}

      Is this tool call justified? Reply YES or NO with a brief explanation.
    `,
  });

  return response.text.startsWith("YES");
}

If the validation agent says no — the tool doesn't run.

This catches hallucinated tool calls, prompt injection attempts, and cases where the agent is just calling tools out of habit rather than necessity. 🛡️

Wrapping Up

We are designing so many agents today. And we're doing it fast. But the harness — the security, the constraints, the access controls — is getting left behind.

At the very least, we should be sure we're not giving an agent access to something it shouldn't have. That the tools we build have opinions about how they get used. That there are guardrails that exist at the architecture level, not just in a prompt.

Strip the identity params. Enforce behavioral constraints in code. Add a reasoning checkpoint before execution.

These three layers won't just make your agent more secure. They'll make it more reliable, more predictable, and way easier to debug when something goes wrong.

And trust me — something will go wrong. The question is whether your harness was ready for it.

Hope you liked the read, follow me on my socials for more tech content. See you in the next blog 👋🏻

Top comments (1)

Vikrant Shukla • May 11

Layer 1 is the one I'd put a flag on for every team starting out with agents — identity params in the tool schema is the agent-era equivalent of trusting client-side validation. The fact that it still shows up in production codebases is wild. The other variant I see is putting tenant_id in the schema "so the model can route correctly" — same problem, same fix: it comes from the session, never from the model.

Layer 2 deserves more attention than it gets. The read-before-write constraint is a great example, and once you start looking you find dozens of these latent state-machine assumptions hiding in your tools. We added a small precondition decorator that takes a list of "required prior tool calls within N seconds" and enforces them centrally; debugging agent traces got dramatically easier because the failure now points at the exact missing step rather than a downstream weirdness.

On Layer 3 — the reason field is interesting but I'd be cautious. In practice the model will happily fabricate a plausible-sounding justification because it's been trained to be helpful, so the validator agent ends up validating prose, not intent. What's worked better for us is structural checks: "does the cited entity actually exist in the prior turns?", "does this tool call repeat a recent failed call with no new information?", "is the agent stuck in a 2-step loop?". Cheap, deterministic, and catches a surprising fraction of the same failure modes the LLM-validator targets, without the extra inference cost.