DEV Community

Praveen Govindaraj
Praveen Govindaraj

Posted on

Why 89% of AI agents never reach production — and what’s quietly fixing that

Process Automation in the Agentic Era

There’s a small, mostly unnoticed moment that happens millions of times a day in 2026.
You’re typing a message. Halfway through a sentence, the machine finishes your thought for you. Sometimes you accept the suggestion. Sometimes you don’t. Either way, when you press send, the message goes out under your name. Nobody asks who wrote it. You did. The machine helped. That’s how we’ve collectively decided to think about it.
It’s a small philosophical accommodation, this one. Quiet, almost invisible. We’ve agreed that intelligence can flow through a person — that a thought can come from somewhere else and still belong to us, as long as we read it before pressing send. As long as the human stays in the loop.
Now imagine the machine isn’t finishing your sentence. It’s running your loan underwriting. It’s processing your insurance claim. It’s triaging your medical complaint at three in the morning when the on-call doctor is asleep. The intelligence isn’t flowing through a human anymore. It’s flowing past one. And the question we papered over in the autocomplete era — who, exactly, just did that? — suddenly matters very much.
This is the question the AI industry has been quietly grappling with for the last twelve months. Not whether machines can think — we figured that out a while ago. The harder question: when a machine acts, who is acting? And what does the world need in place to make that question answerable?
There’s a number that captures the shape of this problem.
71% of organizations now use AI agents. Only 11% of agentic use cases reach production.
That gap — between trying an agent and trusting one — is where most of the interesting work in software is happening right now. It isn’t getting much airtime. There’s no flashy new model dropping every Tuesday. No viral demo of an agent booking a flight. Just a slow, methodical rebuilding of an entire layer of software that, until recently, no one was sure we needed.
It turns out we did.
The promise that keeps almost-arriving
For about three years now, the pitch for AI agents has been irresistible. You describe what you want in plain English. The agent figures out how to get it done — calls APIs, reads documents, drafts emails, makes decisions, escalates to humans when stuck. Software that thinks. Knowledge work that runs itself.
In demos, it’s magic. In production, it’s mostly a graveyard.
Klarna built one that genuinely works — handles two-thirds of customer support tickets. Ramp shipped a buyer agent that processes purchases. A handful of others have made it across. But for every Klarna, there are a hundred enterprises with a closet full of half-finished pilots. Agents that work brilliantly on Tuesday and inexplicably destroy a database on Thursday. Agents that pass the demo but fail the audit. Agents that the legal team won’t sign off on, that the finance team can’t budget for, that the operations team can’t monitor when something goes wrong.
The interesting question isn’t why this is happening. We mostly know why. The interesting question is what the industry is now doing about it — and the answer, surprisingly, is that we’re rediscovering a 25-year-old idea.
The standard nobody wanted that everyone now needs
If you worked in enterprise software in the early 2000s, you probably encountered something called BPMN — Business Process Model and Notation. It looked like a flowchart. It had circles for events, rectangles for tasks, diamonds for decisions, and lanes for who-does-what. Banks loved it. Insurance companies loved it. Hospitals loved it. Software people, mostly, did not.
For two decades, BPMN sat in the “boring enterprise” corner of software, alongside things like middleware and document management. The cool kids — and the AI startups especially — built workflow tools that ignored it entirely. Zapier connected apps. Make.com chained operations. n8n let you write JavaScript between nodes. Each had its own visual language. None of them were BPMN, and that felt fine, because BPMN was for compliance officers in suits.
Then something interesting happened in 2025. Camunda — the company that has spent twenty years quietly making BPMN tools — published a report on the state of agentic orchestration. It contained the 71%/11% number. It also contained an argument that the AI industry didn’t quite want to hear: the problem wasn’t that the agents weren’t smart enough. The problem was that there was no shared language between the people building agents and the people who had to live with them.
The compliance officer can’t read Python. The engineer doesn’t want to write a 40-page process document. The auditor needs to see the workflow before sign-off, but the workflow only exists in a giant prompt that was edited at 2 AM by a contractor. The legal team needs an artifact they can review. The operations team needs a diagram they can monitor. The finance team needs a budget they can attach to a process step.
BPMN, it turns out, was already designed for exactly this. Standard since 2011. Read by every BPM tool ever built. Approved by regulators in every major jurisdiction. The thing the AI agent industry was missing was the thing the BPMN industry had been holding for two decades.
So a quiet pivot started. Camunda began shipping AI agent capabilities directly into BPMN diagrams. Academic papers started appearing — “BPMN Assistant,” “H2A-BPMN,” “Mestro” — all asking the same question: can we use LLMs to generate BPMN diagrams, and have those diagrams orchestrate the LLMs back? The answer, it’s turning out, is yes.
What the new layer actually looks like
Eric Broda, who has been writing about this for a while, calls it “Agentic Process Automation” — APA, distinct from RPA (the robot-script tools of the 2010s) and from BPM (the heavyweight workflow suites that came before).
APA is not a product. It’s a runtime architecture. The pieces it requires, roughly:
A process manager that runs the workflow — knows which task is current, what state the data is in, when to escalate, when to retry, when to fail. Think of it as the conductor.
A process registry where workflows live as versioned, signed artifacts. Like a package registry, but for business processes. You can publish, you can subscribe, you can roll back.
A communications fabric — a normalized event stream so that when a task completes, every other system that cares (monitoring, billing, audit, notifications) hears about it in the same format. Without this, every agent integration becomes a custom mess.
An event normalizer that translates the dozen different event vocabularies (Claude’s tool_use, OpenAI's function_call, Anthropic's content_block_delta, vendor X's whatever) into one shared schema. Otherwise the auditor sees ten different log formats and can't reason about any of them.
And — this is the one that matters most — formal human-in-the-loop checkpoints. Not “the agent will ask if it’s unsure,” which is probabilistic and unreliable. Actual gates. The agent cannot execute past this point until a named human, with documented authority, approves with a justification, captured in an audit trail.
If that sounds like overkill for your weekend chatbot, that’s because it is. APA isn’t for the chatbot. It’s for the loan underwriting workflow. The claims process. The sanctions screening. The places where an agent making a wrong call costs money, breaks regulations, or hurts a person.
The convergence nobody is talking about
Here’s what’s strange. If you survey the agentic no-code tools shipping right now — and there are a lot of them — you find them all converging on the same set of patterns. Independently. From different starting points.
n8n started as a workflow automation tool for developers. In January 2026, it shipped tool-level human-in-the-loop gating — you can require explicit human approval before an AI agent invokes a specific tool, with the approval routed through Slack, email, or a web form. In March 2026, it shipped visual diff between workflow versions: side-by-side canvas rendering with changed nodes highlighted.
Dify started as an LLMOps platform. It now ships a “Knowledge Pipeline” — a separate visual canvas just for the data engineering side of RAG (parse → chunk → embed → index → retrieve), letting non-engineers configure how documents become context.
OpenAI launched Agent Builder in October 2025 — drag-and-drop canvas, inline evaluations on each node, version history, preview runs, exportable to SDK code. Sam Altman called it “Canva for building agents.”
Microsoft’s Copilot Studio has gone the furthest on governance. Wave 3 (March 2026) ships agent inventory queryable from Azure Resource Graph, an Activity Map showing what agents are accessing in real time, MCP allowlist policies that admins can enforce, and HITL via Outlook forms. Microsoft Defender and Purview wrap the whole thing.
Vellum focuses on what they call “cost SLO gates” — you set a rule that says “don’t promote this version to production if cost-per-resolved-ticket exceeds $0.08,” and the platform enforces it.
Pipedream lets you embed real code in any node, exposes 10,000+ API tools through a hosted MCP server, and syncs workflows to GitHub.
Tines, born in security operations, calls its visual builder “Storyboard” and treats every action as a HTTP block — schemas change all the time, so the abstraction is “make a request” not “use the Salesforce connector.”
Google’s Opal lets you describe what you want in natural language and generates a working workflow visually.
These tools come from radically different starting points. Different companies, different funding sources, different target users, different licenses. And yet, if you list what each one shipped in the last twelve months, the lists overlap so heavily it’s startling.
Sticky notes for annotation. Time-travel debuggers. AI copilots that generate workflows from descriptions. Versioning baked into the canvas. Cost dashboards. Tool-level HITL gating. Visual diff. Multi-environment promotion. Inline evals.
When ten independent teams arrive at the same set of features, that’s not coincidence. That’s the field discovering its shape.
The three people in every room
The shape, if you look closely, is defined by three people who have to coexist in front of a screen:
The business analyst. Probably draws BPMN diagrams in Visio today. Doesn’t code. Owns the requirements document. Has to walk into a meeting with compliance and the legal team and explain how a process works. Their job is on the line if a diagram gets approved that contains a regulatory violation.
The AI engineer. Comfortable in Python, lives in their IDE, debugs with print statements and trace replays. Wants to iterate on a prompt and see the difference. Doesn’t care what color the start event is. Cares deeply that they can roll back a bad deployment in five seconds.
The operations manager. Doesn’t author. Doesn’t code. Their world is dashboards. They get paged when something fails. They need to know which instance is stuck, why it’s stuck, who can unstick it, and how much it’s costing while it’s stuck. They sign the SLA.
Every successful platform serves all three. Every failed platform serves one and assumes the others will accommodate. The reason most AI agent tools have a low ceiling — and the reason BPM tools historically had a low floor for engineers — is that each was optimized for one persona at the expense of the others.
The new generation of tools is figuring out how to make a single product feel native to all three. Different default views, different terminology in tooltips, different keyboard shortcuts, different defaults — but the same underlying artifact. The analyst sees a flowchart. The engineer sees code. The ops manager sees a live monitor. All three are looking at the same process. None of them can corrupt what the other sees.
That sounds simple. It is technically the hardest problem in this category.
The parts that are hard, the parts that are interesting
If you peel back the visual polish on any of these tools, the hard part is always the same: keeping the diagram and the code in sync without losing information.
A BPMN diagram doesn’t capture everything an engineer cares about — token budgets, retry semantics, type signatures, OWASP compliance bindings. A piece of code doesn’t render itself as a flowchart that an analyst can read. So you need a third representation that both can be projected from. Most teams call this an “AST” — abstract syntax tree, the same kind of structure compilers use internally.
The interesting platforms (Inkeep is the explicit pioneer, but several others are converging on this) use the AST as canonical truth. The diagram is generated from it. The code is generated from it. An LLM that wants to add a step generates a patch against the AST, not against either projection. Round-trips work because no projection is authoritative — the structure underneath is.
This is the same architectural insight that made compilers good in the 1970s. It just took us four decades to apply it to business processes.
The other hard parts are mostly about cost and trust. AI agents are expensive to run, in ways that traditional automation isn’t. A workflow that costs $0.04 per execution today might cost $0.40 next month if you change the model. Without economic governance — budgets, SLOs, gates — you can’t deploy safely. Without observability — every tool call traced, every prompt logged, every decision attributable — you can’t audit. Without human-in-the-loop checkpoints with real authority semantics, you can’t pass compliance review.
These aren’t AI problems. They’re plumbing problems. They’re the kind of unglamorous work that makes the difference between a demo and a product.
Where this lands
The gap between 71% adoption and 11% production isn’t going to close because models get smarter. The models are already smart enough for most of what enterprises want to do. The gap will close because the layer between the model and the business — the layer that’s been missing for the last three years — is finally being built.
It will look like BPMN, because BPMN already solved the “talk to compliance” problem twenty years ago. It will look like a modern visual editor, because business analysts won’t read code. It will have an AI copilot, because nobody wants to drag rectangles for an hour. It will have versioning and cost gates and tool-level HITL, because you can’t deploy a $40K-per-month agent without economic governance.
It will, in other words, look like the thing that thirteen different teams are independently building right now.
When the dust settles, and the tools mature, and one or two of them become standards, we won’t talk about “agentic process automation” the way we talk about it now. It will just be how processes are automated. The same way we don’t really talk about “internet-enabled email” anymore. The infrastructure becomes invisible when it works.
That’s the boring, beautiful work happening underneath all the agent-demo theater. It’s not as fun to watch as a chatbot booking a flight. But it’s the reason your bank, your insurer, and your hospital might actually be running an AI agent five years from now without anyone noticing.
The 11% becomes 60% not because of a breakthrough. Because of plumbing.
Sources for this article include Camunda’s 2026 State of Agentic Orchestration report, Eric Broda’s writing on Agentic Mesh, the n8n release notes for January and March 2026, OpenAI’s AgentKit announcement (October 2025), Microsoft Copilot Studio Wave 3 (March 2026), Dify’s Knowledge Pipeline release, Vellum’s enterprise guide, and academic papers including BPMN Assistant (arXiv:2509.24592), H2A-BPMN (Springer LNCS 2026), and the Agentic BPM Manifesto (arXiv:2603.18916). Production case studies referenced: Klarna’s customer support agent, Ramp’s buyer agent, LY Corporation’s work assistant, Carlyle’s deployment metrics.

Top comments (0)