willamhou

Posted on Apr 22

When Rust's Exhaustive Match Helps (And When It Doesn't): Notes from a Bare-Metal Hypervisor

#rust #embedded #arm #systems

Disclaimer: This is about an experimental hypervisor project that only runs on QEMU virt — no real-hardware validation yet. The lessons apply to "Rust's tooling edges in systems programming," not production guidance.

10 weeks into writing an ARM64 bare-metal hypervisor, I assumed Rust's exhaustive match would be the safety net when I extended my state machine. Two observations, from one week of commits: exhaustive match didn't help my state machine at all, but caught 6 errors the one time I extended my Device enum. This post is about why — and why the distinction is about cardinality, not typestate vs tag enums.

I'm writing an ARM64 bare-metal hypervisor. Part of it is a thing called a Secure Partition (SP) — a lightweight VM managed by the SPMC. Each SP has a lifecycle: Reset → Idle → Running → Blocked → Preempted. 5 states, 7 legal transitions.

Two weeks ago I added a new transition: Blocked → Preempted, for chain preemption between SPs. By the textbook, this is exactly the scenario where Rust's enum + match should shine: add a state/transition, the compiler finds every site that needs updating.

The compiler said nothing.

This post is about why I didn't use the "enum-with-fields" pattern you see in tutorials, why match exhaustiveness didn't help on this state machine, and where it actually did help.

The Real Code

No toy examples. Here's the actual SpState from the repo:

// src/sp_context.rs
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[repr(u8)]
pub enum SpState {
    Reset = 0,
    Idle = 1,
    Running = 2,
    Blocked = 3,
    Preempted = 4,
}

Classic tag-only enum — #[repr(u8)], every variant is one byte, no payload. Why not the textbook Running { entry_pc: u64 } / Preempted { saved_ctx: VcpuContext }?

Because the state lives in an AtomicU8.

The SPMC runs on multiple physical CPUs. Different CPUs inside TF-A's SPMD (Secure Partition Manager Dispatcher) can route requests to the same SP at once. Two CPUs racing to do Idle → Running — one must lose, or both will ERET into the same SP and clobber register context.

CAS drives the race:

pub fn try_transition(&self, expected: SpState, new_state: SpState) -> Result<(), SpState> {
    match self.state.compare_exchange(
        expected as u8,      // success: AcqRel publishes our context-save
        new_state as u8,     // failure: Acquire syncs the observed loser
        Ordering::AcqRel,
        Ordering::Acquire,
    ) {
        Ok(_) => Ok(()),
        Err(actual) => Err(SpState::try_from(actual).expect("corrupt SP state value")),
    }
}

The constraint isn't memory layout — #[repr(u8, C)] on a fields-carrying enum does give stable layout. The real constraint is size: AtomicU8 wraps one byte, and any enum with a u64 payload is at least 8 bytes wide. Atomic u64 CAS is fine on aarch64, but that means every state change either serializes through a fat struct CAS or falls back to a lock. I wanted single-byte CAS in the fast path, so the payload lives elsewhere (in a separate VcpuContext guarded by the state transition itself).

Side note on expect("corrupt SP state value"): it really does panic. In this project the panic handler halts the offending CPU and dumps state via UART — because if the AtomicU8 ever holds a value outside 0..=4, memory corruption has already happened and limping along is worse than stopping. That's a conscious choice for this binary, not a general bare-metal guideline.

Why Exhaustive Match Didn't Help

The legal-transition check lives in one function:

// src/sp_context.rs
pub fn transition_to(&mut self, new_state: SpState) -> Result<(), &'static str> {
    let current = self.state();
    let valid = match (current, new_state) {
        (SpState::Reset, SpState::Idle) => true,
        (SpState::Idle, SpState::Running) => true,
        (SpState::Running, SpState::Idle) => true,
        (SpState::Running, SpState::Blocked) => true,
        (SpState::Blocked, SpState::Running) => true,
        (SpState::Blocked, SpState::Preempted) => true,  // ← the newly added line
        (SpState::Running, SpState::Preempted) => true,
        (SpState::Preempted, SpState::Running) => true,
        _ => false,
    };
    // ...
}

Note the final _ => false. This is not an exhaustive match — the wildcard swallows every unlisted combination as "illegal."

The commit that added Blocked → Preempted was literally 1 line. The compiler reported nothing, because to the compiler, all 25 (from, to) combinations are covered (7 explicit + _ fallback).

I could have replaced _ => false with all 18 illegal combinations enumerated. I started to — "exhaustive is more Rust-y". Then I gave up halfway:

// This way...
(SpState::Reset, SpState::Reset) => false,
(SpState::Reset, SpState::Running) => false,
(SpState::Reset, SpState::Blocked) => false,
// ... 15 more lines of this

No new information, and every future state addition means maintaining an N² table. _ => false is the documentation here: what's listed is legal; everything else isn't.

Verdict: For simple C-style enum + state-transition pairs, match exhaustiveness doesn't save you. Bugs at this layer can only be caught by unit tests (my test_sp_context.rs has 58 assertions covering every legal transition plus key illegal ones).

Where It Actually Saved Me

The place where match exhaustiveness actually saved me was device dispatch.

My hypervisor uses a Device enum to enumerate all virtual devices. Every time the guest touches MMIO, a match dispatches to the right implementation:

// src/devices/mod.rs
pub enum Device {
    Uart(pl011::VirtualUart),
    Gicd(gic::VirtualGicd),
    Gicr(gic::VirtualGicr),
    VirtioBlk(virtio::mmio::VirtioMmioTransport<virtio::blk::VirtioBlk>),
    VirtioNet(virtio::mmio::VirtioMmioTransport<virtio::net::VirtioNet>),
    Pl031(pl031::VirtualPl031),
}

This is a fields-carrying enum — each variant holds the state struct for its device. No _ fallback on matches against it, because every variant has its own handler:

impl MmioDevice for Device {
    fn read(&mut self, offset: u64, size: u8) -> Option<u64> {
        match self {
            Device::Uart(d) => d.read(offset, size),
            Device::Gicd(d) => d.read(offset, size),
            Device::Gicr(d) => d.read(offset, size),
            Device::VirtioBlk(d) => d.read(offset, size),
            Device::VirtioNet(d) => d.read(offset, size),
            Device::Pl031(d) => d.read(offset, size),
        }
    }
    // write, contains, is_ready, ...
}

When I added Pl031 (PL031 RTC) for Android boot, I only touched the enum definition. The compiler immediately fired 6 errors — every site that matches against Device was missing the Pl031 arm:

error[E0004]: non-exhaustive patterns: `&Device::Pl031(_)` not covered
  --> src/devices/mod.rs:51:15
error[E0004]: non-exhaustive patterns: `&mut Device::Pl031(_)` not covered
  --> src/devices/mod.rs:62:15
error[E0004]: non-exhaustive patterns: `&Device::Pl031(_)` not covered
  --> src/devices/mod.rs:73:15
// ... 6 total

Two of those were helper methods I'd written when adding VirtioNet and completely forgotten about. Had I used C switch without -Wswitch-enum (which Linux kernel and TF-A both enable by default), those two sites would silently fall into default and return "unknown device." The guest would do any MMIO to the RTC, fail to find a device, and hang mid-boot with an error pointing somewhere completely unrelated.

C with -Wswitch-enum + -Werror gives you the same check — the relevant difference is that Rust makes it a precondition for compiling instead of a build-system setting you can drop. Worth more in a solo project, less in a shop with a strict style guide.

Either way, the compiler caught this bug instead of the guest doing so at boot time.

When Exhaustive Match Actually Pays Off

Reviewing this state-machine extension + Device extension, here's my distilled rule:

Exhaustive match saves you: fields-carrying enum + every variant has independent handler logic.

Device::{Uart, Gicd, ..., Pl031} — each device's read/write is totally different
MmioAccess::{Read { reg, size }, Write { reg, size, val }} — read vs write semantics differ
ExitReason::{HvcCall, SmcCall, DataAbort, WfiWfe, ...} — each exception class has its own handler

Common trait: adding a variant potentially leaves gaps across the entire codebase, and each gap's correct implementation is non-trivial (not just "error vs OK" binary output).

Exhaustive match doesn't help: simple tag enum + cartesian-product check.

State machine (from, to) transition table — N² explosion, _ => false is more readable
Permission matrix (user_role, action) — same
Input sanity check match(input) { valid_range => ..., _ => reject } — tautological

These scenarios are "enumerate a small set of legal cases, reject everything else." _ => fallback loses no information — it's more readable.

A Few Takeaways

1. #[repr(u8)] is everyday life in hypervisor/kernel/driver code. Don't apologize for the atomic trade-off.

Every time a "Rust state machine" tweet appears, someone in the replies recommends typestate. Typestate is genuinely powerful when transitions happen through owning APIs (File::open → Handle<Open>), but it doesn't compose with shared mutable state across CPUs — the entire point of AtomicU8 is that multiple cores hold a reference to one byte. Typestate requires owning self by value to consume the old state; a multi-CPU SPMC can't do that on the fast path. Not a rejection of typestate, just the wrong tool for this edge.

2. _ => fallback isn't a sin, but ask yourself every time.

"If I add a new variant in the future, should this site force me to update it?"

Yes → drop the _, enumerate every variant
No (illegal state-machine pair, MMIO unknown-offset) → _ => default is documentation

3. State-machine correctness is never a gift from Rust. It's a gift from tests + documentation + code review.

My test_sp_context.rs has dedicated tests for every legal transition, a bunch of illegal ones, and CAS races. Rust didn't generate those; I wrote them. Rust saved me from some defensive code (no "sixth value" of SpState — try_from_u8 rejects it), but whether the legal-transition table is correct, Rust has no opinion.

4. What really saves you is "fields-carrying enum + each variant has its own handler."

That's Rust's signature strength. Find the places in your codebase that fit this pattern and get them right — it pays more than agonizing over whether the state machine should be typestate-ified.

Closing

My hypervisor isn't a "zero-unwrap" project. The repo has about 6 unwrap() calls (concentrated in test fixtures and boot-time paths that can't reasonably panic) and 45 _ => default fallback arms (mostly in MMIO register decode for unknown offsets).

Every unwrap() and _ => was a decision at the time, not laziness. Engineering beats slogans.

Rust gives you a good weapon. It doesn't think for you. Whether the state-transition table is legal is in your head, not the compiler's.

Code: github.com/willamhou/hypervisor

Blog: willamhou.github.io/hypervisor

This is part 5 of the ARM64 Hypervisor development series. The Chinese version is the canonical source — see part5-enum-state-machine.md.

Top comments (5)

NOVAInetwork • Apr 26

Great write-up. The distinction between "fields-carrying enum where each variant has its own handler" vs "tag-only enum with cartesian product checks" is something I wish more Rust posts talked about.

I hit the exact same tradeoff building a BFT consensus engine. My transaction types are a tag-only enum (first byte of payload), and the dispatch is a big match with _ => reject. Adding a new tx type forces updates at the dispatch site, but the state-transition validation (nonce checks, balance checks, fee enforcement) all live behind the _ => error pattern — same reasoning as your _ => false on the SP state machine.

Where exhaustive match saved me was the codec layer — decode_block_v1 calls a different decoder per wire format version, and forgetting a variant there means silent data loss. Exactly your Device dispatch pattern.

Your point about #[repr(u8)] + AtomicU8 being the right call for multi-CPU state is spot on. Typestate evangelists rarely account for shared mutable state across threads.

willamhou • Apr 27

Thanks — and the codec/wire-format example sharpens the point: exhaustive match's value is "compiler forces you to handle the new variant", and that matters most when the natural fallback (return Ok with defaults, accept anyway) would silently corrupt rather than reject.

Your point about typestate-vs-shared-state lands hard. The Rust community's typestate enthusiasm mostly assumes single-threaded, single-owner code; once state is shared across CPUs (atomic, lock-protected, MMIO), you're back to runtime checks regardless of how clever your types are. I almost wrote a section pushing back on this directly but cut it for length.

Out of curiosity — does your consensus layer have hooks for runtime invariant assertions (like an assert!(state == Expected) instrumentation pass)? In the SPMC I leaned on debug_assert! heavily for cartesian-product invariants, but I've never seen a consensus codebase do that systematically.

NOVAInetwork • Apr 27

Good question. I actually don't use debug_assert! in the
consensus layer at all. I use it in the codec and SMT
layers for things like buffer length checks after encoding
and bit-index bounds, but consensus itself just returns
errors through a ConsensusError enum.

The reason is basically what you said. In BFT most
"invariants" are only true when peers are honest. A stale
vote or a proposal from the wrong leader isn't a bug, it's
just something the protocol needs to handle. So I route
all of that through error variants like NotLeader,
AlreadyProposed, equivocation checks, etc. The hot path
has no assertions.

The closest thing I have to a systematic instrumentation
pass is my chaos test harness. It runs invariant checkers
after every scenario (partitions, crashes, Byzantine faults)
and checks things like height monotonicity, no forks, commit
consistency. It's out-of-band instead of inline but it covers
the cross-node invariants that a single-process debug_assert
can't reach anyway.

And yeah the typestate thing breaks down fast once state
crosses node boundaries. We found the same thing.

willamhou • Apr 29

The split you're drawing is more nuanced than just trust boundaries — it's three layers really: local data-structure invariants where debug_assert! is fine, protocol-facing paths where unexpected messages and stale state are normal outcomes (often just asynchrony, not malice), and system-level properties
like no-forks or commit consistency that no single-process check can fully see anyway.

Your chaos harness fits the third layer cleanly. Two votes crossing in flight isn't a peer lying, it's just the network — which is exactly why "did the protocol break" has to be answered globally, not locally.

In hypervisor work I've hit the first two layers — asserts on EL2 internal state where I own the structure, error returns for anything fed by a guest (since guests can be buggy as well as adversarial). I won't push the analogy further; consensus has a distributed-observability problem that single-machine

code mostly doesn't.

NOVAInetwork • Apr 29

The three-layer framing is better than how I was
thinking about it. Local data structure invariants,
protocol-facing paths where weird inputs are just
normal async behavior, and system-level properties
that need global observation. That maps cleanly to
how our codebase is actually organized.

The guest analogy from hypervisor work makes sense
too. A guest being buggy vs adversarial is the same
distinction as a peer being slow vs Byzantine. You
handle both the same way at the protocol boundary
regardless of intent.

Good conversation. Don't see many people thinking
about assertion strategy at this level.