When event-driven systems grow past a handful of services, the biggest failures usually are not infrastructure failures. They are contract failures.
A producer adds a field and a consumer crashes.
A team renames an enum value and downstream processing silently misclassifies events.
A “minor change” ships without coordination and turns into a production incident.
In this post, I will walk through how I design a contract-first event-driven architecture on AWS with a focus on:
- Event versioning strategies
- Schema registry usage
- Consumer tolerance patterns
- Breaking vs non-breaking changes
- Governance for event contracts
I will also include an end-to-end walkthrough, implementation discussion, architecture, and code examples that show how I typically structure this in practice.
This pattern is especially useful when I want:
- multiple teams publishing and consuming events
- safe independent deployments
- compatibility checks in CI/CD
- replayable operations
- and a clear change-management process around event contracts
Why contract-first matters in event-driven systems
I like event-driven architectures because they reduce direct coupling at runtime. But they can easily create hidden coupling at the data contract level.
A queue, bus, or topic only decouples transport. It does not automatically decouple:
- field names
- field types
- nullability
- enum values
- semantic meaning
- version expectations
That is why I treat the event contract as a product interface, not just a JSON blob.
A contract-first approach means:
- I define the event schema before (or alongside) producer code
- I validate changes in pull requests and CI
- I classify changes as breaking or non-breaking
- I enforce compatibility policy before deployment
- Consumers are built to be tolerant where appropriate
What I mean by “contract-first” on AWS
On AWS, I usually use Amazon EventBridge as the routing layer for domain and integration events. For contract visibility and developer ergonomics, I use EventBridge Schemas (registry/discovery/code bindings) and a Git-based contract repository as the source of truth.
EventBridge Schemas supports custom schemas, inferred schemas, and code bindings, and supports both OpenAPI 3 and JSONSchema Draft4 formats. (docs.aws.amazon.com)
A key implementation detail that is easy to miss: for contract-first systems, I do producer-side validation before publishing to the bus. AWS explicitly recommends JSON Schema for client-side validation so events conform to the schema. (docs.aws.amazon.com)
That means I think about contract enforcement in two layers:
- Design-time / CI-time enforcement (compatibility and governance)
- Runtime enforcement (producer validation, consumer critical-field validation)
Architecture Overview
At a high level, I split the solution into four concerns:
- Contract governance (Git + PRs + compatibility checks)
- CI/CD publication (schema artifacts + code bindings + service deployment)
- Runtime event transport (EventBridge bus + rules + consumers)
- Operational controls (archive/replay, observability, version adoption metrics)
The guiding principle is simple:
- Git repo is the source of truth
- Schema registry is the discovery/distribution layer
- EventBridge is the routing layer
- Producer validation is the enforcement point
- Consumers are tolerant readers, not brittle mirror parsers
End-to-End Walkthrough
This is the end-to-end flow I use in a contract-first setup.
1) Define the event contract in a versioned repository
I keep contracts in a dedicated repo (or a clearly separated folder in a platform repo), with one folder per domain event and explicit versions.
Example structure:
contracts/
orders/
order-created/
v1/
schema.json
examples/
valid-minimal.json
valid-full.json
invalid-missing-id.json
metadata.yaml
v2/
schema.json
migration-notes.md
I usually store:
- the schema (
schema.json) - example payloads (valid and invalid)
- contract metadata (owner, lifecycle, SLA, compatibility policy)
- migration notes (for majors)
This keeps the contract reviewable and testable before code changes are merged.
2) Open a PR and run compatibility checks in CI
When a producer team proposes a schema change, the CI pipeline:
- lints the schema
- validates example payloads
- compares the proposed schema against the last released version
- classifies the change as breaking or non-breaking
- blocks deployment if it violates policy
This is the point where I want failure to happen.
It is much cheaper to fail a PR than to fail a consumer at runtime.
3) Publish the schema artifact and optional code bindings
After the contract PR is approved, the pipeline publishes the schema to EventBridge Schemas (or updates the schema artifact in the registry workflow).
EventBridge Schemas can store custom schemas and generate code bindings for supported languages, which can help teams bootstrap producers/consumers faster. (docs.aws.amazon.com)
I still keep Git as the source of truth. The registry is a distribution and discovery aid, not my governance system.
4) Producer validates events before publishing to EventBridge
At runtime, the producer constructs an event envelope and validates the detail payload against the contract schema (and optionally validates the envelope as well).
I do not rely on “the bus will catch it.” I validate before PutEvents.
This is especially important in multi-team environments where one bad deploy can affect many consumers.
5) EventBridge routes events to consumers
Once published, the event goes to an EventBridge custom bus and is routed by rules to targets such as:
- Lambda
- SQS (then Lambda workers)
- Step Functions
- EventBridge Pipes targets
- other buses/accounts (depending on architecture)
I keep routing concerns separate from schema governance concerns. The bus routes. The contracts define compatibility.
6) Consumers apply tolerant-reader patterns
Consumers should not parse the full event contract unless they truly need every field.
Instead, I design consumers to:
- read only the fields they need
- ignore unknown fields
- use safe defaults where appropriate
- validate critical fields they depend on
- gracefully handle unsupported versions
This is what lets independent deployments actually work in practice.
7) Archive and replay for recovery and backfills
For operational resilience, I often enable EventBridge archive and replay for important event buses.
EventBridge archives can filter by event pattern and later replay events back to the same source event bus (not a different bus). EventBridge also annotates replayed events with a replay-name field, which is useful for observability and preventing accidental re-archiving loops. (docs.aws.amazon.com)
There are replay caveats worth accounting for in design:
- replayed events are not guaranteed to be replayed in original ingestion order
- there can be delay before recently received events are available in the archive
- replay targets are on the source bus (you select rules on that bus) (docs.aws.amazon.com)
That means my consumers should be:
- idempotent
- order-tolerant where possible
- replay-aware (for metrics and side effects)
Event Envelope and Contract Shape
I strongly prefer a stable envelope and versioned detail payload.
A practical EventBridge event envelope looks like this:
{
"source": "com.acme.orders",
"detail-type": "OrderCreated.v1",
"time": "2026-02-25T10:42:00Z",
"detail": {
"eventId": "evt_01J...",
"schemaVersion": "1.2.0",
"orderId": "ord_123",
"customerId": "cus_789",
"amount": 149.95,
"currency": "AUD",
"createdAt": "2026-02-25T10:41:59Z"
}
}
Why I separate detail-type major version and schemaVersion
I often use a hybrid strategy:
-
detail-typeincludes the major version for routing and coarse compatibility (OrderCreated.v1,OrderCreated.v2) -
detail.schemaVersioncarries the full semantic version (1.2.0) for visibility, telemetry, and debugging
This gives me:
- simple EventBridge rule routing by major version
- clearer operational visibility into actual schema rollout
- room for non-breaking evolution within a major
Event versioning strategies
There is no single universal versioning strategy. I choose based on blast radius, team maturity, and consumer tolerance.
Strategy 1: Major version in event type (my default)
Example:
OrderCreated.v1OrderCreated.v2
When I use it
- multiple consumers across teams
- strict backward compatibility boundaries
- need clear routing and migration windows
Pros
- easy routing and coexistence
- explicit migration path
- lower ambiguity in logs/metrics
Cons
- can create duplicate rules/targets during migration
- more operational overhead during dual support
Strategy 2: Single event type + schemaVersion field only
Example:
detail-type = "OrderCreated"detail.schemaVersion = "1.3.0"
When I use it
- fewer consumers
- strong tolerant-reader discipline
- changes are mostly additive
Pros
- simpler routing
- fewer EventBridge rules
Cons
- consumers must inspect payload version
- easier to accidentally ship breaking changes under the same event type
Strategy 3: Parallel events for semantic shifts
Sometimes a change is not just a new version. It is a new concept.
Example:
OrderCreatedOrderSubmittedOrderAccepted
If semantics change, I prefer a new event name over “versioning my way out” of domain ambiguity.
This is often cleaner than endlessly evolving one overloaded event.
Breaking vs non-breaking changes
This is where teams frequently get burned, because “non-breaking” is contextual.
Usually non-breaking (with tolerant consumers)
- Adding a new optional field
- Adding metadata consumers can ignore
- Widening field length limits (if consumers do not assume old max)
- Adding a new event type (without changing existing ones)
Often breaking
- Renaming a field
- Removing a field
- Changing field type (
number->string) - Making an optional field required
- Changing date format or timestamp semantics
- Reusing the same field name with a different meaning
Context-dependent (treat carefully)
- Adding a new enum value This is non-breaking only if consumers tolerate unknown enum values.
- Making a field nullable This can break consumers that assume presence/non-null.
- Reordering array semantics This can break consumers that rely on order.
My rule is:
If a consumer written against the previous contract can fail or silently misbehave, I treat it as breaking.
Schema registry usage on AWS
What I use EventBridge Schemas for
I use EventBridge Schemas for:
- schema discovery (especially in dev/staging)
- storing custom event schemas
- helping teams find contracts
- generating code bindings for faster adoption
EventBridge Schemas supports creating/uploading schemas and inferring schemas from events on an event bus, and supports both OpenAPI 3 and JSONSchema Draft4. (docs.aws.amazon.com)
What I do not use it for (by itself)
I do not treat the registry as sufficient governance.
A registry can tell me “what exists.” It does not automatically enforce:
- compatibility policy
- deprecation timelines
- ownership approvals
- rollout coordination
- consumer migration commitments
That is why I pair it with Git + CI + governance workflow.
Practical recommendation
- Dev/staging: schema discovery can help identify what is actually being emitted
- Production: publish vetted schemas from CI, avoid “registry discovered it so it must be okay” thinking
Consumer tolerance patterns (the part that protects independent deployments)
Consumer tolerance is what turns contract-first from process overhead into deployment freedom.
1) Tolerant reader pattern
Consumers parse only fields they need and ignore extras.
Bad approach:
- deserialize the entire payload into a strict model and fail on unknown fields
Better approach:
- read subset fields needed for the business action
- validate only those fields and critical invariants
2) Defensive enum handling
If I consume an enum like status, I do not assume I know every future value.
I usually implement:
- known value handling
- fallback bucket (
UNKNOWN) - metrics/alerts for unseen values
This avoids outages caused by additive enum expansion.
3) Defaulting and null tolerance (with business rules)
I default only where the business semantics are safe.
Examples:
- safe default for optional metadata field: yes
- safe default for money amount: no
- safe default for event timestamp: usually no
The goal is to avoid brittle parsing without masking real data quality issues.
4) Version-aware adapters (upcasters/downcasters)
When migrations are active, I sometimes introduce a small adapter layer:
-
upcaster: converts
v1payload to internal canonical model expected by newer consumer logic - downcaster (rarer): emits compatibility events for legacy consumers during transition
This is often cleaner than embedding version branching everywhere in business code.
5) Idempotent processing for replay and retries
Replays and retries are normal in event-driven systems. Consumers should be idempotent based on a stable event key (eventId, domain aggregate ID + version, etc.).
This matters even more when I enable EventBridge archive/replay. EventBridge replay behavior is operationally powerful, but consumers still need idempotent side effects. (docs.aws.amazon.com)
Code: Contract schema (JSON Schema Draft4 style)
Below is a simplified contract for OrderCreated.v1. I am using JSON Schema because it fits runtime validation well, and AWS documentation explicitly recommends JSON Schema for client-side validation in this scenario. (docs.aws.amazon.com)
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "OrderCreated.v1",
"type": "object",
"additionalProperties": true,
"required": [
"eventId",
"schemaVersion",
"orderId",
"customerId",
"amount",
"currency",
"createdAt"
],
"properties": {
"eventId": {
"type": "string",
"minLength": 1
},
"schemaVersion": {
"type": "string",
"pattern": "^1\\.\\d+\\.\\d+$"
},
"orderId": {
"type": "string",
"minLength": 1
},
"customerId": {
"type": "string",
"minLength": 1
},
"amount": {
"type": "number",
"minimum": 0
},
"currency": {
"type": "string",
"enum": ["AUD", "USD", "EUR"]
},
"createdAt": {
"type": "string",
"format": "date-time"
},
"couponCode": {
"type": ["string", "null"]
},
"metadata": {
"type": "object"
}
}
}
Notes on this schema
- I intentionally allow
additionalProperties: trueto support tolerant evolution within a major version. - I keep the regex on
schemaVersionaligned with the major (1.x.x). - I model optional fields explicitly and avoid making everything required just because it exists today.
Code: Producer-side validation and publish to EventBridge (Python)
This example validates detail against the JSON Schema before publishing to EventBridge.
import json
import os
from datetime import datetime, timezone
from uuid import uuid4
import boto3
from jsonschema import Draft4Validator, FormatChecker
events = boto3.client("events")
EVENT_BUS_NAME = os.environ["EVENT_BUS_NAME"]
SCHEMA_PATH = os.environ.get("SCHEMA_PATH", "schemas/order-created-v1.json")
with open(SCHEMA_PATH, "r", encoding="utf-8") as f:
ORDER_CREATED_V1_SCHEMA = json.load(f)
validator = Draft4Validator(ORDER_CREATED_V1_SCHEMA, format_checker=FormatChecker())
class ContractValidationError(Exception):
pass
def validate_detail(detail: dict) -> None:
errors = sorted(validator.iter_errors(detail), key=lambda e: e.path)
if errors:
formatted = []
for e in errors:
path = ".".join(str(p) for p in e.path) or "<root>"
formatted.append(f"{path}: {e.message}")
raise ContractValidationError("; ".join(formatted))
def publish_order_created(order: dict) -> dict:
detail = {
"eventId": f"evt_{uuid4().hex}",
"schemaVersion": "1.0.0",
"orderId": order["order_id"],
"customerId": order["customer_id"],
"amount": float(order["amount"]),
"currency": order["currency"],
"createdAt": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
"metadata": {
"channel": order.get("channel"),
"traceId": order.get("trace_id")
}
}
# Remove null metadata values to keep payloads clean
detail["metadata"] = {k: v for k, v in detail["metadata"].items() if v is not None}
validate_detail(detail)
envelope = {
"Source": "com.acme.orders",
"DetailType": "OrderCreated.v1",
"EventBusName": EVENT_BUS_NAME,
"Time": datetime.now(timezone.utc),
"Detail": json.dumps(detail)
}
response = events.put_events(Entries=[envelope])
# Basic publish result handling
if response.get("FailedEntryCount", 0) > 0:
failed = [e for e in response.get("Entries", []) if "ErrorCode" in e]
raise RuntimeError(f"PutEvents failed: {failed}")
return response
Why I validate on the producer
Because contract-first only works if I enforce the contract before the event leaves the service boundary.
The registry helps discovery. CI helps governance.
Producer validation prevents runtime contract drift.
Code: Simplified compatibility check (CI gate)
In production, I usually use a stronger compatibility checker (or a custom policy engine), but here is a clear example of how I classify a subset of schema changes in CI.
import json
from typing import Dict, Any, List, Set
def load_schema(path: str) -> Dict[str, Any]:
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
def classify_change(old: Dict[str, Any], new: Dict[str, Any]) -> List[str]:
findings = []
old_props = old.get("properties", {})
new_props = new.get("properties", {})
old_required = set(old.get("required", []))
new_required = set(new.get("required", []))
old_keys = set(old_props.keys())
new_keys = set(new_props.keys())
removed_fields = old_keys - new_keys
added_fields = new_keys - old_keys
if removed_fields:
findings.append(f"BREAKING: removed fields {sorted(removed_fields)}")
# Added required fields can break old producers/consumers
added_required = new_required - old_required
if added_required:
findings.append(f"BREAKING: newly required fields {sorted(added_required)}")
for field in sorted(old_keys & new_keys):
old_type = old_props[field].get("type")
new_type = new_props[field].get("type")
if old_type != new_type:
findings.append(
f"BREAKING: field '{field}' type changed from {old_type} to {new_type}"
)
# Enum changes are context-dependent; flag for review
old_enum = old_props[field].get("enum")
new_enum = new_props[field].get("enum")
if old_enum is not None or new_enum is not None:
if old_enum != new_enum:
findings.append(
f"REVIEW: field '{field}' enum changed from {old_enum} to {new_enum}"
)
# Additive optional fields are usually non-breaking
optional_additions = [f for f in added_fields if f not in new_required]
if optional_additions:
findings.append(f"NON_BREAKING: optional fields added {sorted(optional_additions)}")
if not findings:
findings.append("NO_CONTRACT_DIFF_DETECTED")
return findings
if __name__ == "__main__":
old_schema = load_schema("contracts/orders/order-created/v1/schema.json")
new_schema = load_schema("contracts/orders/order-created/v1-next/schema.json")
for line in classify_change(old_schema, new_schema):
print(line)
Important note
This script is intentionally simplified. Real compatibility checks should also evaluate:
- nullability changes
- numeric range tightening
- string length tightening
- nested object/array changes
- semantic changes (which tooling cannot detect reliably)
That is why I combine automated checks with contract review governance.
Code: Consumer tolerant-reader Lambda (Python)
This consumer reads only a subset of fields and handles unknown enum values safely.
import json
from typing import Any, Dict
KNOWN_CURRENCIES = {"AUD", "USD", "EUR"}
def parse_event(record: Dict[str, Any]) -> Dict[str, Any]:
"""
EventBridge Lambda target event shape (single event invocation).
We intentionally read only what we need.
"""
detail_type = record.get("detail-type", "")
detail = record.get("detail", {})
# Envelope guardrails
if not detail_type.startswith("OrderCreated.v"):
raise ValueError(f"Unsupported detail-type: {detail_type}")
# Critical field validation (subset)
order_id = detail.get("orderId")
amount = detail.get("amount")
currency = detail.get("currency", "UNKNOWN")
event_id = detail.get("eventId")
if not event_id:
raise ValueError("Missing eventId")
if not order_id:
raise ValueError("Missing orderId")
if amount is None:
raise ValueError("Missing amount")
try:
amount = float(amount)
except (TypeError, ValueError):
raise ValueError("Invalid amount")
# Tolerant enum handling
if currency not in KNOWN_CURRENCIES:
currency = "UNKNOWN"
return {
"eventId": event_id,
"orderId": order_id,
"amount": amount,
"currency": currency
}
def lambda_handler(event, context):
parsed = parse_event(event)
# Idempotency key should usually be eventId (persist/check externally)
# business processing here...
print(json.dumps({"message": "processed", **parsed}))
return {"statusCode": 200, "processedEventId": parsed["eventId"]}
What this consumer demonstrates
- it does not assume full schema lockstep
- it validates only critical fields
- it tolerates additive changes
- it degrades safely on unknown enum values
This pattern dramatically reduces breakage from non-breaking producer evolutions.
Code: AWS CDK snippet (EventBridge bus, archive, and rule)
This is a compact example of how I might wire the bus and a rule in CDK (TypeScript). Archive is optional but useful for recovery and replay workflows.
import * as cdk from "aws-cdk-lib";
import { Construct } from "constructs";
import * as events from "aws-cdk-lib/aws-events";
import * as targets from "aws-cdk-lib/aws-events-targets";
import * as lambda from "aws-cdk-lib/aws-lambda";
export class ContractsEdaStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const bus = new events.EventBus(this, "DomainBus", {
eventBusName: "domain-events"
});
const consumerFn = new lambda.Function(this, "OrderConsumerFn", {
runtime: lambda.Runtime.PYTHON_3_12,
handler: "app.lambda_handler",
code: lambda.Code.fromAsset("lambda/order-consumer")
});
new events.Rule(this, "OrderCreatedV1Rule", {
eventBus: bus,
eventPattern: {
source: ["com.acme.orders"],
detailType: ["OrderCreated.v1"]
},
targets: [new targets.LambdaFunction(consumerFn)]
});
// Optional: event archive for replay and recovery
new events.CfnArchive(this, "DomainBusArchive", {
archiveName: "domain-events-archive",
sourceArn: bus.eventBusArn,
description: "Archive selected domain events for replay",
// Optional filter pattern
eventPattern: {
source: ["com.acme.orders"]
},
retentionDays: 30
});
}
}
Implementation discussion (what makes this hold up in production)
This is the part I care about most. The architecture is the easy part. Operating it safely is the real work.
1) Make one layer the source of truth
I strongly recommend choosing one authoritative source for contracts.
My preference:
- Git contracts repo = source of truth
- EventBridge Schemas = discovery/distribution
- generated code = convenience artifact (never hand-edited)
If teams edit schemas ad hoc in multiple places, drift becomes inevitable.
2) Separate compatibility policy by scope
Not every event needs the same compatibility rigor.
I usually define contract classes, for example:
-
Internal team-local events
- faster iteration
- smaller deprecation windows
-
Cross-team domain events
- strict review
- compatibility gates
- longer deprecation windows
-
External/public integration events
- strongest governance
- formal versioning and migration docs
This prevents over-governing small internal signals while protecting high-blast-radius contracts.
3) Decide your “major version rollout” playbook in advance
When a breaking change is truly necessary, I do not improvise. I use a defined rollout pattern.
Typical playbook:
- Introduce
v2alongsidev1 - Dual-publish (or route from an adapter) for a migration window
- Track consumer adoption
- Freeze new dependencies on
v1 - Announce deprecation date
- Remove
v1after agreed window
This is much more reliable than “we changed it, please update soon.”
4) Govern semantics, not just structure
Schema validation catches structural drift. It does not catch semantic drift.
Example:
-
amountstill exists and is a number - but the team changed it from “gross amount” to “net amount”
JSON Schema will happily validate that.
To reduce semantic drift, I add:
- clear field descriptions
- example payloads
- domain glossary references
- explicit units/timezone/currency semantics
- contract review by both producer and consumer owners
5) Be careful with enums
Enums are a common source of accidental breakage.
What I do:
- treat enum additions as review-required
- require consumers to implement fallback handling for non-critical enums
- document whether enum is “closed” or “extensible”
This avoids the classic “we only added one value” outage.
6) Use archive/replay intentionally, not casually
EventBridge archive/replay is powerful for:
- recovery after consumer bugs
- onboarding new consumers
- backfilling state after fixes
But it changes operational assumptions:
- replay can be delayed
- replay order may differ from original arrival order
- replayed events are marked with
replay-namemetadata - replay is tied to the source bus (docs.aws.amazon.com)
So I design consumers to:
- be idempotent
- avoid unsafe side effects on duplicate/replay
- optionally detect replayed events for observability paths
7) Observe version adoption as a first-class metric
I like to emit and dashboard:
- events published by
detail-type - events published by
schemaVersion - validation failures by producer
- consumer parse failures by version
- unknown enum value rates
- replayed event counts (
replay-namepresent)
This gives me a factual view of migration readiness instead of relying on team status updates.
Governance for event contracts
Contract governance does not need to be bureaucratic, but it does need to be explicit.
Minimum governance I recommend
Contract ownership
Every contract should have:
- producer owner
- platform owner (optional but useful)
- primary consumer group(s) for review
Pull request rules
I typically require:
- schema diff summary
- compatibility classification
- migration impact statement
- updated examples
- deprecation notes (if applicable)
CODEOWNERS / mandatory reviews
At minimum:
- producer team review
- platform or architecture review for breaking changes
- affected consumer review (for major changes)
Versioning policy
Document:
- what counts as patch/minor/major
- what fields are stable
- deprecation window length
- dual-publish expectations
Lifecycle states
I label contracts like:
draftactivedeprecatedretired
This avoids ambiguity around old but still discoverable schemas.
A practical contract metadata file (optional but very useful)
I often pair each schema with metadata like this:
name: OrderCreated
majorVersion: 1
status: active
owners:
producerTeam: orders-platform
platformTeam: eventing-platform
compatibilityPolicy:
mode: backward-compatible-within-major
enumAdditionsRequireReview: true
deprecation:
minimumNoticeDays: 90
observability:
metricsTag: orders.order_created
examples:
- examples/valid-minimal.json
- examples/valid-full.json
This gives CI and reviewers policy context that plain JSON Schema does not express.
Common mistakes I see (and how I avoid them)
“We have a schema registry, so we are contract-first”
Not necessarily.
A registry improves discoverability. Contract-first requires:
- versioned source of truth
- compatibility policy
- validation enforcement
- governance workflow
“Non-breaking means no consumer work”
Also not necessarily.
Even additive changes can require:
- monitoring updates
- analytics model adjustments
- new enum handling
- data warehouse schema evolution
“Consumers should validate the full schema too”
Usually not a good idea.
Consumers should validate:
- the envelope/version they support
- critical fields they depend on
- business invariants they enforce
Over-validating the full payload makes consumers brittle and defeats decoupling.
“We can do breaking changes quickly if we notify everyone”
This works until it does not.
I prefer explicit versioning and migration windows over coordination by chat message.
Closing thoughts
The best event-driven architectures are not just asynchronous. They are intentionally evolvable.
For me, contract-first design is how I make that happen:
- schemas as interfaces
- compatibility checks before deployment
- producer-side validation
- tolerant consumers
- governance that scales with team count
- replay-aware operations
If I were implementing this from scratch on AWS today, I would start with:
- EventBridge custom bus
- Git-based contract repo (JSON Schema)
- CI compatibility checks
- Producer validation before
PutEvents - Tolerant-reader consumer template
- Optional EventBridge archive/replay for critical event domains
- Version adoption dashboards
That gives a strong foundation without overcomplicating the first iteration.
References
- Amazon EventBridge Schemas user guide (schemas, custom/inferred schemas, code bindings, supported formats) (docs.aws.amazon.com)
- Creating an event schema in Amazon EventBridge (JSON Schema/OpenAPI support, client-side validation recommendation) (docs.aws.amazon.com)
- Generating code bindings for EventBridge schemas (supported languages and workflow) (docs.aws.amazon.com)
- Archiving and replaying events in Amazon EventBridge (archive/replay behavior, source bus replay, replay metadata, ordering/delay considerations) (docs.aws.amazon.com)
- Amazon EventBridge API Reference:
PutEvents(API semantics and request shape) (docs.aws.amazon.com) - JSON Schema specification (for runtime validation patterns)
- AsyncAPI specification (optional contract documentation model for event APIs)
Corresponding Mermaid code
flowchart TB
%% Contract-First Event-Driven Architecture on AWS (Schemas, Validation, Compatibility)
classDef svc fill:#EEF2FF,stroke:#4F46E5,stroke-width:1px,color:#1E1B4B;
classDef data fill:#ECFDF5,stroke:#059669,stroke-width:1px,color:#064E3B;
classDef gov fill:#FFF7ED,stroke:#EA580C,stroke-width:1px,color:#7C2D12;
classDef ci fill:#FCE7F3,stroke:#DB2777,stroke-width:1px,color:#831843;
classDef consumer fill:#EFF6FF,stroke:#2563EB,stroke-width:1px,color:#1E3A8A;
subgraph Dev["Contract-First Governance (Git)"]
A1["AsyncAPI / JSON Schema repo
versioned contracts"]:::gov
A2["CODEOWNERS + PR review
producer/consumer approval"]:::gov
A3["Compatibility checks
(non-breaking vs breaking)"]:::gov
A4["Contract changelog + deprecation policy"]:::gov
end
subgraph CI["CI/CD Pipeline"]
B1["Lint schema + examples"]:::ci
B2["Run compatibility test
against previous versions"]:::ci
B3["Publish schema artifact
(EventBridge Schemas / package)"]:::ci
B4["Deploy producer + consumer"]:::ci
end
subgraph Prod["AWS Runtime"]
C1["Producer service
(App / Lambda / ECS)"]:::svc
C2["Producer-side validation
JSON Schema validator"]:::svc
C3["EventBridge Custom Bus"]:::svc
C4["EventBridge Schemas
Registry / discovery / code bindings"]:::data
C5["Archive (optional)"]:::data
C6["Replay (optional)"]:::svc
subgraph Routing["Fan-out"]
D1["Rule A -> Lambda Consumer"]:::consumer
D2["Rule B -> SQS queue -> Lambda"]:::consumer
D3["Rule C -> EventBridge Pipe / Step Functions"]:::consumer
end
E1["Consumer tolerance layer
ignore unknowns, defaults, subset parsing"]:::consumer
E2["Consumer-side validation
critical fields only"]:::consumer
E3["Business processing"]:::consumer
E4["DLQ / error handling"]:::consumer
end
subgraph Ops["Observability & Governance Runtime"]
F1["Contract metrics
version adoption / failures"]:::gov
F2["CloudWatch Logs / Metrics / Alarms"]:::gov
F3["Schema review board / release gates"]:::gov
end
A1 --> B1 --> B2 --> B3 --> B4
A2 --> B2
A3 --> B2
A4 --> B4
B4 --> C1
C1 --> C2 -->|valid event| C3
C2 -->|invalid event| F2
C3 -. schema discovery .-> C4
C3 --> D1
C3 --> D2
C3 --> D3
C3 -. optional archive .-> C5
C6 --> C3
D1 --> E1
D2 --> E1
D3 --> E1
E1 --> E2
E2 -->|pass| E3
E2 -->|fail| E4
C1 -. emits version metric .-> F1
E2 -. validation errors .-> F1
C3 -. bus metrics .-> F2
E4 -. alarms .-> F2
F3 --> A2

Top comments (0)