Max Quimby

Posted on May 14 • Originally published at agentconn.com

CI/CD Broke Under Agents: The Continuous Compute Stack

#ai #agents #devops #cicd

📖 Read the full version with charts and embedded sources on AgentConn →

At AI Engineer Europe last week, Hugo Santos (CEO, Namespace) and Madison Faulkner (NEA) stood in front of a room of platform engineers and said the quiet thing out loud: CI/CD is dead for agent-based systems. Traditional CI was built for humans pushing one or two diffs a week. When you scale to thousands of autonomous agents opening PRs continuously, the abstractions break — runner saturation, cold Docker builds on every branch, cost explosion, feedback latency that lets context decay before the agent sees the test result.

They coined a new vocabulary for what replaces it: continuous compute and continuous computers, not continuous integration. The framing is sharp because the structural shift it points to is already happening — and the operational layer it implies is what every ops team running Claude Code Max, Cursor, or a private agent fleet is going to be invoiced for over the next two quarters.

This piece does three things. First, name the four ways traditional CI structurally breaks under agent-volume load. Second, map the production stack that is visibly forming this week across ElevenLabs, Vercel, Anthropic, and the GitHub trending charts. Third, give ops teams a buyer's-guide checklist for when the CI bill triples after they turn on agent workflows for the eng org.

1. Where traditional CI/CD actually breaks

Three numbers anchor the structural shift:

Human PR volume: ~10 PRs per developer per day on a typical team. With reviews and merges, ~50–100 CI runs per repo per day on a mid-size codebase.
Agent PR volume: Cowork 1-shotted booking 8 flights and 5 hotels with Opus 4.7 this week — multi-step agent workflows are now multi-PR by default. Operators running fleets see 100–1000+ PRs per day from the agent layer alone.
Per-PR CI cost: Docker builds, dependency installs, full test suites. On a typical SaaS repo with a 12-min CI run, that's ~$0.20–$0.40 per run on hosted runners. Multiply by 1000+/day per repo.

Four things break when the rate jumps two orders of magnitude:

Docker build cache invalidation patterns. Build caches assume human-paced commit cadence — most pushes hit a shared base layer. Agents working on parallel branches in parallel sandboxes blow through caches because they don't share branch ancestry the way human teams do. Cold builds on every agent branch turn a five-minute CI run into a fifteen-minute one and double the runner spend.

Runner pool sizing. Pool capacity is planned against human PR rate. Once you turn on autonomous agents, the rate is bounded by the agent's token-per-second budget, not by a developer drinking coffee between commits. You will saturate the pool. You will get queueing. The queue will burn agent context faster than the CI tells the agent whether the test passed.

Test-feedback latency. When a human waits for CI, twelve minutes is annoying. When an agent waits for CI, twelve minutes is context decay. The agent that submitted the PR is no longer the agent that sees the result — its working memory has been recycled. The result becomes a stale message in a queue, and the agent has to re-derive context from the PR diff to act on it.

Branch hygiene. Agent branches are cheap to create and expensive to delete. Operators are finding their repos accumulating thousands of stale agent branches, each with a build artifact, each with a cache, each with metadata GitHub charges to store. The garbage collection problem isn't sexy. It is the largest single source of unexpected platform spend operators are reporting in 2026.

That's the demolition. Now the construction.

2. The Continuous Compute stack that's visibly forming

The shape of what replaces CI is decomposing across four distinct layers — and each layer had its launch moment this week. That co-incidence is part of why the convergence is real. Nobody's hyping a single platform; multiple players in adjacent niches are independently confirming the architecture.

Layer 1: The routing layer — explicit workflow graphs replace the mega-prompt

ElevenLabs shipped Agent Workflows with a visual graph editor as the headline interface. The pitch is dry — "edges support sophisticated routing logic that enables dynamic, context-aware conversation paths" — but the structural change underneath is the news: single-prompt agents are giving way to explicit routing graphs with conditional branching, sub-agent dispatch, and per-node tool/knowledge-base overrides.

This is the same story as LangGraph and CrewAI two years ago, but with the production tax actually paid. May 2026 release notes mention conditional_operator AST nodes for branching expressions and ASTNullNode types for null-comparison branches in workflow logic. That's not marketing — that's a team building a graph-execution engine for production agents. The mega-prompt era is over for production traffic.

ElevenLabs Agent Workflows documentation →

Layer 2: The substrate — filesystems, not storage

Vercel's Nico Albanese went viral this week with the talk "Give Your Agent a Computer". The thesis: giving an agent a filesystem (not just storage) changed how the agent behaved. Agents with persistent FS-shaped substrate stopped re-deriving context on every call and started following through on multi-step tasks — they used files the way humans use scratchpads.

This is structurally important for the CI question because it splits the data-locality concern from the execution concern. Continuous compute doesn't mean "more runners." It means the agent's compute environment persists between PRs. The agent doesn't restart cold; its filesystem state carries forward. That's the inversion of how CI was designed — CI was specifically ephemeral, because human PRs don't need persistent disk state. Agent PRs do.

Layer 3: The control plane — Agent View

Anthropic shipped Agent View on May 11 — a research preview in Claude Code that lists, starts, and supervises multiple agent sessions from one screen. Boris Cherny's announcement hit 486k views; the companion announcement on Cowork's 1-shot booking flow hit 424k more. The signal is clear: the dominant UI pattern for the next phase is human-as-orchestrator-of-agent-fleets, not human-as-author.

The implication for continuous compute is that you need a control surface — not just observability, not just dashboards, but a place to dispatch new sessions, see what's blocked, and reroute work. Each row in Agent View shows the session, whether it needs input, the last response, and recency. That's the user-facing shape of continuous compute. The CI dashboard's children's children.

Read the Agent View announcement on Claude.com →

Layer 4: The capability bundles — skills as portable units

The GitHub trending chart this week is dominated by skill-bundles-as-product. mattpocock/skills is #1 with +3,372 stars in a day ("Skills for Real Engineers. Straight from my .claude directory.") obra/superpowers is #4 with +1,506 ("Agentic skills framework & software development methodology that works"). anthropics/skills is #9 with +645. Three skill repos in the top ten on the same day is a category, not a coincidence.

The structural point: skills are the externalization format for the agent's capabilities. They make the routing graph (Layer 1) and the agent's filesystem (Layer 2) portable. You ship a skill bundle, the agent loads it like a library, and the routing graph references it as a callable node. This is the package manager layer of the continuous compute stack.

mattpocock/skills on GitHub →

Layer 5: The memory layer — persistent state across runs

The piece that turns continuous compute from a slogan into an actual product is memory. rohitg00/agentmemory hit the GitHub trending chart this week at #5 with +1,335 — "#1 Persistent memory for AI coding agents based on real-world benchmarks." farion1231/cc-switch (#6, +1,186) is the meta-tool for switching between agent CLIs while preserving memory.

For ops teams, the memory layer is the budget question: it determines whether your agents amortize learning across runs or pay the re-derivation cost every PR. The numbers on amortization are stark — internal benchmarks operators are quoting put context-retrieval savings at 30–60% of total agent token spend when memory is wired correctly.

rohitg00/agentmemory on GitHub →

3. The Cowork inflection: multi-step really works now

If you want a single signal for why the stack is decomposing this fast, it's Anthropic's Cowork. One agent. One shot. Eight flights booked, five hotels reserved. Multi-step planning, tool use across booking APIs, recovery from intermediate failures — all in a single session. 424k views on the announcement tweet because operators understood what they were looking at: the practical floor for multi-step agent reliability just moved.

When the floor moves, the operational stack underneath has to catch up. Multi-step reliability is what made every CI assumption invalid in the first place. A single human PR doesn't book 13 things in sequence with state preserved between steps. An agent PR can — and once that becomes the expected workload, the CI substrate has to be redesigned for it.

4. The buyer's checklist for ops teams

If you're about to see your CI bill triple because the eng org turned on Claude Code Max, here's what to actually buy or build:

1. A routing/workflow editor. Pick ElevenLabs Agent Workflows if you live in conversational AI. Pick LangGraph or Vercel AI SDK Workflows if you're TypeScript-first. The point is not to write a single mega-prompt as your production pipeline. Anything custom you put in production should be in a visualizable graph that a teammate can review without reading 4000-token prompts.

2. A persistent filesystem layer for agents. Not S3, not a database — actual filesystem semantics that survive between agent runs. Vercel's pattern is one approach; running Docker volumes that persist beyond CI builds is another. The hard requirement is that the agent doesn't start cold on every PR.

3. A control plane for fleet-of-agents. Claude Code Agent View is the canonical reference now. Build or buy something where a human can see fleet-wide state at a glance and dispatch/redirect. Without this, you have observability over individual agents, not over the system.

4. A skill-bundle convention. Adopt either the Anthropic claude/skills directory format or one of the popular trending alternatives (mattpocock/skills, obra/superpowers). The point is not to invent your own. Skills are how knowledge becomes portable between agents.

5. A persistent memory layer. agentmemory or the equivalent. Without amortized memory, your agent spends 40%+ of every PR re-deriving context from the codebase. That's the largest cost-saving lever in the stack.

6. Branch hygiene automation. Build the deletion job. Schedule it. Tag agent-authored branches in commit metadata so you can prune by author class without affecting humans.

The Hugo Santos / Madison Faulkner framing — continuous compute, not continuous integration — captures the shape correctly. The substrate is computers that persist. The deliverable is not "an integrated build artifact" but "an agent that has consistent state to act from." Same problem the CI/CD generation solved for human-paced teams, redesigned for the agent-paced reality.

Operators have one quarter to get this stack stood up before the second tier of platforms starts charging premium rates for the routing-and-memory layer they should have built themselves. The vocabulary is new. The architecture is concrete. The bill is coming.

For more on what's running on the agent runtime side, see our coverage of agent harness fragmentation and the skill marketplace race.

Originally published at AgentConn