Grumpy Sage

Posted on May 11 • Originally published at cybrium.ai

Why Every CISO Needs an AIBOM in 2026 — And What Vendors Miss

#security #ai #devsecops #governance

A friend of mine runs security at a mid-market fintech. Last month she got asked a question by her board that should have been trivial: "How many AI models are in production at our company, and where did they come from?"

She had a vendor-provided AIBOM. A real one. Generated by a well-known platform you've heard of. She pulled it up on the projector during the board meeting.

The AIBOM listed 14 models. She knew there were more.

After the meeting she spent two days with her platform team running their own inventory. The real number was 47. Some were embedded in SaaS tools her business teams had bought without telling her. Some were running locally on engineering workstations — llama.cpp instances developers had spun up to avoid the OpenAI rate limits. Two were fine-tuned variants of Llama 3 that a data science team had deployed inside a Kubernetes namespace nobody was scanning. One was a vLLM server somebody stood up on a GPU node six months ago and forgot about.

The vendor AIBOM had captured the API-based stuff. Anthropic. OpenAI. Bedrock. Easy targets. Everything that left a billing trail.

What it missed was the actual AI surface area. The part that sits inside her perimeter, runs on her hardware, processes her data, and has no rate limit or vendor SOC 2 to fall back on. The part that, if compromised, doesn't ring an alarm at a third party.

This is the AIBOM problem in 2026. The artifact exists. The compliance checkbox gets ticked. And the inventory is still wrong.

The thesis

An AIBOM is not an SBOM with a "model" row added. It's a fundamentally different artifact because AI systems have a fundamentally different supply chain — one that includes weights, prompts, embeddings, retrieval indexes, fine-tuning datasets, inference runtimes, and the agent scaffolding that ties them together. If your AIBOM doesn't capture all of those, what you have is a marketing document. And most of what's being shipped right now is exactly that.

What an AIBOM actually has to contain

Let me be specific, because the vendor space has gotten lazy about this.

A real AIBOM tracks the model itself — name, version, weights hash, license, provenance. That's the easy part. The part everyone gets right.

Then it has to track the inference runtime. This is where the wheels start coming off. Are you running Ollama? vLLM? TGI? LocalAI? Triton? LM Studio? llama.cpp? Each of those has its own CVEs, its own auth model, its own default configurations, and its own attack surface. A Llama 3 8B running on vLLM behind proper auth is a different risk than the same weights running on a default Ollama install with the API exposed on 0.0.0.0. The AIBOM has to know the difference.

Then the data lineage. What did the model get trained on? What does it get fine-tuned on? What sits in the retrieval index it's pulling from at inference time? An AIBOM that doesn't capture the RAG corpus is missing maybe 40% of the actual attack surface, because that's where prompt injection lives now. The model is fine. The PDFs your sales team uploaded last Tuesday are the threat.

Then the prompt layer. System prompts, tool definitions, agent loops, MCP server bindings. If your model has access to ten tools through an MCP server, those ten tools are part of the bill of materials. If one of them is a "send_email" tool with no human approval gate, that's a fact your AIBOM should be screaming about. Not buried in an appendix.

Finally, the runtime context. What network does this thing live on? What service account does it run under? What does it have IAM access to? You cannot reason about AI risk without that context, because the same model is a different risk profile depending on whether it can read your S3 buckets.

If you accept that list, you've already disqualified maybe 80% of the AIBOM tooling on the market. Most of it stops at "model name + version + license."

Where vendors go wrong, specifically

I want to name patterns, not vendors, because the patterns will outlive the vendors.

Pattern one: the SBOM-with-extra-columns approach. Some vendor took their existing software composition analysis tool and added a "model" detection rule. They find references to openai in your package.json and call that an AIBOM entry. This catches nothing self-hosted, nothing embedded in vendor SaaS, and nothing running outside the codebase you happen to be scanning. It's a checkbox.

Pattern two: the API-trail approach. Vendor watches your egress traffic or your cloud billing and infers AI usage. Better than nothing — catches shadow Anthropic accounts. But useless for anything inside the perimeter. A vLLM server on your internal GPU cluster generates zero egress traffic. It also generates zero AIBOM entries in this model.

Pattern three: the survey approach. Vendor sends a questionnaire to your dev teams. "List all AI systems in production." This is governance theater. The teams that fill it out conscientiously are not the teams you're worried about.

Pattern four: the model-registry approach. Vendor integrates with MLflow or SageMaker Model Registry and treats that as ground truth. Great if your entire organization uses one model registry. Nobody's entire organization uses one model registry. The shadow Ollama instance isn't in MLflow.

What all four of these share is that they're trying to generate an AIBOM from one perspective — the codebase, the network, the people, or the registry. AI systems live across all of those. You need detection that lives across all of those too.

The detection problem is a code problem first

Here's an opinionated take. The single highest-leverage place to build AI inventory is the codebase itself. Not because that's where everything lives, but because that's where most of the self-hosted, embedded, and shadow stuff originates. Somebody, somewhere, wrote an import statement.

This is what cyscan does in our platform. We've got 1,815 detection rules across 75+ languages, and a meaningful chunk of those are AI-specific patterns — runtime imports, model loading calls, agent framework usage, embedding library references, MCP client instantiations. If a developer imported vllm or instantiated an Ollama client or wired up a LangChain agent with a tool list, we want to know.

cyscan ai-inventory --repo ./monorepo --output aibom.json

The output isn't a list of models. It's a graph. Here's a service that loads Llama-3-8B-Instruct, runs it on vLLM, exposes it on port 8000, and is called by these three other services, one of which has an MCP server attached with these four tools. That's an AIBOM entry that you can actually reason about.

But code scanning alone isn't enough — that's the lesson I keep watching CISOs learn the expensive way. Code tells you what should exist. It doesn't tell you what's actually running on the GPU node nobody documented.

The runtime side: scanning what's actually live

This is where cyradar comes in, and where the architectural choice we made matters. cyradar specifically targets the self-hosted inference layer — Ollama, vLLM, TGI, LocalAI, Triton, LM Studio, llama.cpp. We picked those seven because they cover almost everything self-hosted in 2026. If you've got a GPU running an LLM, it's almost certainly one of those.

The point isn't just to find them. The point is to fingerprint them. What model is loaded? What version of the runtime? Is the auth configured? Is the API exposed on the management network or the data network? What's the context window, the max tokens, the system prompt baked in at startup?

cyradar discover --cidr 10.0.0.0/8 --runtimes all
cyradar fingerprint --target 10.4.12.88:11434

That second command tells you not just "there's an Ollama at this IP" but "there's an Ollama 0.5.7 with llama3:70b and nomic-embed-text loaded, the API is open, no auth, last queried 14 minutes ago." That's an AIBOM entry the code scanner can't produce because the code that spun this up may not exist in any repo you scan. Someone ran ollama pull on a server.

Combine the code-side inventory with the runtime-side inventory, reconcile them, and now you have something that looks like a real AIBOM. The reconciliation is the hard part. The code says service X should be talking to Ollama. The runtime says Ollama is running on host Y. Are those the same instance? You need topology.

The agent and tool layer

I said earlier that tools are part of the bill of materials. I want to push on that, because it's where I see the most magical thinking in current AIBOM standards.

In 2024 you had models. In 2025 you had models with tools. In 2026 you have agents with toolchains that span MCP servers, traditional APIs, and other agents. The "thing" you're inventorying isn't really a model anymore. It's a capability graph.

Our own MCP server exposes 10 tools. Each one represents a capability — scan a repo, fingerprint a runtime, pull a fuzz template, query the rule database. Any agent that connects to our MCP server inherits those 10 capabilities. If your AIBOM lists "Claude" as one entry, you've underspecified the system by an order of magnitude. The relevant entry is "Claude + these MCP servers + these tool permissions + this system prompt + this RAG corpus."

That's a mouthful. It's also reality. Any AIBOM standard that can't express that — and most of the current ones can't, cleanly — is going to be obsolete within a year.

Web-facing AI surface, which everyone forgets

The other gap I see constantly: AI in the web tier. Chatbots embedded in marketing sites. AI search bars. Internal admin tools with an LLM assistant bolted on. Customer support widgets backed by some RAG pipeline somebody set up in a hurry.

These rarely show up in model registries. They rarely show up in code scans of the main monorepo because they live in their own little frontend repo. They almost never show up in network discovery because they call out to a vendor, not in.

cyweb's 22 fuzz categories include LLM-specific ones — prompt injection across the wire, jailbreak attempts via input fields, system prompt extraction, tool invocation abuse. When we scan a web property, we're not just looking for SQLi anymore. We're testing whether the friendly chatbot in the bottom corner can be talked into revealing the system prompt or executing tool calls it shouldn't. If it can, that goes into the AIBOM as a finding, attached to the model and runtime entry for that chatbot.

Our 95% template conversion rate vs upstream community templates matters here because the upstream community is fast — new prompt injection payloads land daily, and the gap between "known technique" and "we can test for it" needs to be small. An AIBOM that catalogs your AI systems but can't test them is a museum exhibit.

Why one platform

I keep getting asked why we built all of this — cyscan, cyradar, cyweb, the MCP server — instead of just picking one and going deep. The answer is exactly the AIBOM problem we've been talking about.

You cannot generate a real AI bill of materials from one vantage point. Code-only misses runtime. Runtime-only misses provenance. Network-only misses the SaaS-embedded stuff. Survey-only misses everything anyone forgot. To get an inventory that's actually correct, you have to triangulate from at least three of those.

If those three tools are bought from three vendors with three data models, the reconciliation happens in a spreadsheet maintained by an exhausted security engineer. I've watched this fail in real organizations. The spreadsheet drifts. The board gets the wrong number.

When the inventory comes from one platform with one data model, reconciliation is a join, not a meeting. That's the architectural choice. It's not about wanting to sell more SKUs. It's that the AIBOM problem is fundamentally a correlation problem, and correlation across vendor boundaries doesn't work.

The recomposition

Here's what I think is actually happening, beyond the AIBOM specifically.

The security industry spent twenty years building tools for a world where software was deployed by humans, ran in known places, and changed on quarterly release cycles. Every tool category — SAST, DAST, SCA, EDR, CSPM — assumes that model.

AI broke the model. Software is now partly deployed by agents, runs in places nobody documented, and changes when a developer types ollama pull. The asset isn't a server or a service anymore. It's a capability graph that includes weights, prompts, tools, data, and runtime. The discovery problem isn't "what hosts do I have" but "what can my systems do, and who taught them to do it."

The AIBOM is the first artifact that tries to express this. The current versions of it are bad because the standards bodies are still thinking in SBOM terms. The good versions, the ones that will actually matter when regulators start asking for them — and they will, by end of 2026 in at least three jurisdictions I'm tracking — those are going to look like capability graphs, not parts lists.

The vendors who get this right are the ones rebuilding their data model around the AI supply chain rather than retrofitting their old one. Everyone else is going to spend 2027 explaining to their customers why the AIBOM they shipped missed half the surface area.

What to do Monday

If you're a CISO reading this and your current AIBOM came from a vendor demo, do one experiment. Run your own inventory — survey the engineering teams, scan the internal network for the seven self-hosted runtimes, grep the monorepo for AI imports. Compare your number to the vendor's number.

If they match, congratulations, you picked well. If they don't, you have a problem that no compliance report will surface until something goes wrong.

We can help with the inventory side. cyscan handles the code, cyradar handles the runtime, cyweb handles the web-facing surface, and the MCP server lets your own agents query the AIBOM directly — which is, in a meta way, how I think AIBOMs will mostly get consumed by 2027 anyway. By other agents.

If you want to talk through yours, find me at anand@cybrium.ai.

DEV Community