Alexey Vidanov for AWS Community Builders

Posted on May 12

Your CI/CD Pipelines Are Your Largest Unmonitored Attack Surface

#aws #security #devops #leadership

The risk in one paragraph

Every time your team deploys software to AWS, a pipeline authenticates with credentials that can modify production infrastructure. In most organizations, these credentials have far more access than needed, are shared across environments, and are never reviewed. If an attacker compromises one pipeline, they own the account.

This is not theoretical. In March 2026, attackers compromised the Trivy security scanner's GitHub Action by force-pushing malicious code to 75 version tags. Every organization running Trivy in their pipeline had secrets stolen. The attack cascaded into further compromises across PyPI and downstream projects. In April 2026, an AI-powered campaign opened 475 malicious pull requests in 26 hours, exfiltrating credentials from hundreds of organizations over six weeks before detection.

Why this keeps happening

Three structural problems:

1. Long-lived credentials. Most pipelines authenticate with static access keys stored as CI/CD variables. These keys don't expire, aren't scoped to specific actions, and persist even after employees leave. One leaked key gives an attacker persistent access.

2. Shared permissions. The same AWS Identity and Access Management (IAM) role deploys to dev, staging, and production. A compromised feature branch can reach production data because nothing in the permission model distinguishes environments.

3. No visibility into what pipelines actually need. Teams request broad permissions because scoping them is slow. Over time, roles accumulate access nobody remembers granting. Nobody audits what a pipeline actually uses versus what it could use.

The pattern that solves this

AWS publishes a reference architecture for least-privilege CI/CD. The core ideas:

Eliminate long-lived credentials entirely. Both GitHub and GitLab support federated authentication (OIDC) with AWS. Pipelines receive short-lived tokens (1 hour) with no stored secrets. If a pipeline is compromised, the token expires before an attacker can establish persistence.

One role per environment, per pipeline. The production deployment role only accepts requests from the main branch of a specific repository. A developer on a feature branch physically cannot assume production credentials, even if they modify the pipeline configuration. The security boundary is in IAM, not in the pipeline file.

Four layers of defense. No single control is sufficient. The pattern stacks:

Organization-wide guardrails (service control policies) that prevent any role from disabling audit logging or leaving approved regions
Permission boundaries on every pipeline role that prevent privilege escalation
Specific grants for only the actions each pipeline needs
Resource-level policies for cross-account access

Separate who creates permissions from who uses them. This is the architectural decision most organizations miss. Two distinct pipelines with different trust levels:

The platform pipeline creates and manages IAM roles. It runs from a dedicated infrastructure repo, requires two human approvals, and is managed by the platform/security team. It can modify permissions but cannot deploy applications.
The service pipelines deploy application code. They assume pre-created roles with fixed, scoped permissions. They can deploy their service but cannot modify their own permissions or anyone else's.

A compromised service pipeline cannot grant itself more access because the tools to do so aren't available to it. The role it assumes was created by a different pipeline, in a different repo, approved by different people. This separation turns a potential account-level breach into a single-service incident.

Automated policy refinement. Instead of guessing what permissions a pipeline needs, run it with broad (but bounded) access in a dev environment for 90 days. AWS CloudTrail records every API call. IAM Access Analyzer generates a least-privilege policy from actual usage. That policy ships to production through the same code review process as application code.

What this means for your organization

Risk reduction. A compromised pipeline can only do what its scoped role allows. With proper boundaries, that means "update one specific service" rather than "administer the entire account."

Compliance alignment. SOC 2, ISO 27001, and FedRAMP all require least-privilege access controls. This pattern provides auditable, version-controlled evidence of permission grants and reviews.

Operational cost. Initial setup takes 2-4 weeks for a platform team. After that, onboarding a new pipeline takes ~10 lines of Terraform. The role-vending module enforces all security controls automatically.

Ongoing maintenance. A weekly automated job generates policy refinement proposals. Engineers review diffs, not raw IAM JSON. The system converges on minimal permissions without manual auditing.

Scaling the investment to the problem

The full pattern is designed for organizations running 50+ pipelines across multiple teams. But the investment scales with the problem:

Your situation	What to adopt now	Investment
1-5 pipelines, one team	OIDC + hand-written policies + boundaries	1-2 days of platform work
5-15 pipelines, 2-3 teams	Add the role-vending Terraform module	1 week to build, then self-service
15-50 pipelines, 3-10 teams	Add automated policy refinement	2 weeks to build the automation
50+ pipelines, 10+ teams	Full pattern with split pipelines and self-service portal	90-day rollout

The first step (OIDC + boundaries) eliminates the most dangerous risk (long-lived credentials with unlimited scope) in a single afternoon per pipeline. Everything after that is incremental hardening.

Time to value

The first pipeline is keyless in one afternoon. The full pattern takes 90 days to mature, but value accrues from day one:

Milestone	Timeline	What you get
First keyless deploy	Day 1	One pipeline on OIDC. No stored credentials. Immediate risk reduction.
Environment isolation	Week 1	Prod role only accepts main branch. Feature branches can't touch production.
Permission boundaries	Week 2	Pipeline roles can't escalate privileges, even if compromised.
Policy from real usage	Day 30+	Access Analyzer generates tight policy from observed behavior. Ship to prod.
Self-service for teams	Week 6+	Role-vending module: teams onboard in 10 lines, security enforced by default.

You don't wait 90 days for the first result. You wait one afternoon. The 90 days is how long it takes for Access Analyzer to observe enough usage to generate a production-ready policy. Everything else ships incrementally.

The emerging risk: AI agents in the pipeline

A growing number of teams use AI coding assistants (GitHub Copilot, Amazon Q Developer, Claude Code) that propose infrastructure changes, including IAM policies. Some organizations run automated agents that tighten permissions or respond to access denials without human intervention.

These agents operate with the same pipeline credentials. If an agent can propose or apply IAM changes, it becomes a privilege escalation vector. "The system prompt says be careful" is not a security control.

The same least-privilege principles apply: agents should have read-only access by default, write access only through reviewed channels, and hard limits on how many changes they can make per time period. This is covered in detail in a companion technical article.

Questions for your platform team

How many of our pipelines use long-lived access keys today?
Do our production deployment roles accept requests from any branch, or only main?
When was the last time someone audited what permissions our pipeline roles actually use versus what they have?
If a pipeline credential leaked today, what is the blast radius?
Do we have alerting on AccessDenied events in production? (If not, we can't detect when permissions are too broad or too narrow.)

Bottom line

The pattern exists. AWS documents it. The tooling is mature. The question is whether your organization treats pipeline credentials with the same rigor as production database access. Based on the incidents of the last 18 months, most don't.

The technical implementation guide covers the full pattern with working Terraform and CDK code.

DEV Community