Two months ago we released gogpu/ui v0.1.0 — 22 widgets, 3 design systems, ~150K lines of pure Go. Since then we shipped 21 patch releases, and the rendering pipeline is unrecognizable.
This post covers what changed and why it matters.
The Problem
v0.1.0 re-rendered the entire widget tree every frame. A 48×48 spinner in one corner caused the GPU to redraw 800×600 of static content. Hover over a button? Full tree walk. Open a dropdown? Full tree walk. This was fine for demos, not for production.
We studied how five frameworks solve this — Flutter, Chrome, Qt6, Android HWUI, Skia — and found the same architecture everywhere: Layer Tree + boundary isolation + damage tracking.
What We Built (v0.1.14 → v0.1.21)
Layer Tree Compositor
Every RepaintBoundary widget now owns a node in a persistent Layer Tree:
OffsetLayer (root)
├── PictureLayer (toolbar — clean, reuse texture)
├── PictureLayer (sidebar — clean, reuse texture)
├── ClipRectLayer (scrollview viewport)
│ └── PictureLayer (content — dirty, re-record)
└── PictureLayer (spinner — dirty, re-record 48×48)
Four layer types — OffsetLayer, PictureLayer, ClipRectLayer, OpacityLayer — compose the frame. Clean layers reuse their GPU texture from the previous frame. Only dirty layers re-render.
This is the same pattern Flutter calls flushPaint + compositeFrame. We validated it against all five reference frameworks before writing a line of code.
0% GPU When Idle
The frame loop checks a flat dirty set — O(1), not O(n) tree walk:
if !w.HasDirtyBoundaries() && !w.NeedsRedraw() && !w.NeedsAnimationFrame() {
return // nothing changed, skip frame entirely
}
When the UI is idle, the GPU does zero work. Measured: 0% GPU across all six examples (hello, signals, taskmanager, gallery, ide, modular-compositor).
Previous approach walked the entire widget tree every frame to check if anything needed redraw. For 200 boundaries, the new approach is 45× faster.
Per-Boundary GPU Textures
Each RepaintBoundary renders into its own offscreen MSAA texture. When a child boundary becomes dirty, only that boundary's texture is re-rendered. The compositor blits all textures in a single non-MSAA pass.
A 48×48 spinner touching 2,304 pixels no longer forces the GPU to process 480,000 pixels of unchanged content.
Multi-Rect Damage
When multiple widgets are dirty in different screen regions, we don't union them into one giant rect. Each dirty rect gets its own GPU scissor:
Frame N: spinner (48×48) + status bar (800×24)
→ Two scissor rects, not one 800×600 rect
→ Zero pixel waste
The damage pipeline flows through the full stack: ui → gg RenderDirectWithDamageRects → wgpu PresentWithDamage. Ring buffer stores rect lists for N-buffer swapchains. Threshold at 16 rects merges to union (GDK/Sway pattern).
Persistent Layer Tree
UpdateLayerTree() reuses layer objects across frames instead of rebuilding the tree:
| Metric | Before | After |
|---|---|---|
| Allocs per frame (200 boundaries) | 613 | 13 |
| Reduction | — | 97.9% |
Flutter calls this addRetained. Android calls it RenderNode reuse. We measured allocation profiles against both and matched their patterns.
The Numbers
| Metric | v0.1.0 | v0.1.21 |
|---|---|---|
| Lines (total / code) | 150K / 105K | 195K / 141K |
| Tests | 6,000 | 7,200+ |
| Coverage | 97% | 97%+ |
| Packages | 56 | 56 |
| GPU idle (static UI) | 5-18% | 0% |
| Frame skip check | O(n) tree walk | O(1) flat set |
| Allocs/frame (200 boundaries) | 613 | 13 |
| Spinner GPU work | full window | 48×48 scissor |
Ecosystem Update
The rendering pipeline required changes across four repositories. Here's where the ecosystem stands:
| Repository | Version | Lines | Code | What It Does |
|---|---|---|---|---|
| naga | v0.17.13 | 323K | 240K | Shader compiler: WGSL → SPIR-V, MSL, GLSL, HLSL, DXIL |
| gg | v0.46.8 | 240K | 171K | 2D graphics: Skia-class rasterizer, GPU SDF, scene compositor |
| wgpu | v0.27.3 | 211K | 164K | Pure Go WebGPU: Vulkan, DX12, Metal, GLES, Software |
| ui | v0.1.21 | 195K | 141K | GUI toolkit: 22 widgets, 4 themes, Layer Tree pipeline |
| gogpu | v0.34.3 | 61K | 45K | App framework: windowing, input, three-mode render loop |
| + gpucontext, gputypes, systray, audio | — | 19K | 13K | Shared interfaces, system tray, audio engine |
| Total | 1,049K | 774K | 3,140 files across 9 repositories |
1M+ total lines. 774K lines of code. Zero CGO. Zero Rust. Zero C.
Recent ecosystem highlights since the v0.1.0 article:
- First Pure Go DXIL generator — naga compiles WGSL shaders directly to DXIL bytecode, eliminating the HLSL→FXC/DXC dependency. 161/170 IDxcValidator pass rate. Article.
- Born ML v0.8.0 migrated to gogpu/wgpu — production ML framework running on our GPU stack. 105 GPU tests pass, HRM model trained 20 epochs. Article.
- CJK text rendering — script-aware hinting, exact-size rasterization, Tier 6 routing for Chinese/Japanese/Korean glyphs.
- LCD ClearType auto-detection — Windows SPI + registry, macOS None, Linux Xft/Wayland. Per-platform subpixel layout.
- Software backend for CI — deterministic GPU without GPU hardware. Pixel-exact e2e tests prove scissor rects at HAL level.
- Community deep-dive — independent technical analysis of gogpu/wgpu (Chinese) covering the zero-CGO syscall architecture, Snatchable resource lifecycle, and buffer state tracking internals. Always good to see the community dig into the implementation.
The Foundation Is Ready
This is the release where we stopped rebuilding and started building on top.
For the past two months every release was infrastructure: retained-mode rendering, scene composition, Layer Tree, damage tracking, boundary isolation. The kind of plumbing that's invisible to users but determines whether a framework can scale to real applications.
That plumbing is now in place. The render pipeline follows the same architectural patterns as Flutter, Chrome, and Qt6 — not because we copied them, but because we studied all five independently and arrived at the same conclusions. Layer Tree composition, per-boundary GPU textures, multi-rect damage, persistent allocation — these are industry-proven patterns, and they're production-ready in gogpu/ui.
The ecosystem has stabilized around this architecture. naga (shader compiler), wgpu (WebGPU HAL), gg (2D graphics), and gogpu (windowing) all reached the point where API churn is minimal and releases are incremental improvements, not rewrites. Nine repositories, 1M+ lines, and the dependency chain holds.
What this means going forward: the pipeline will be optimized, not rebuilt. Future releases will focus on:
- New widgets — the 22 we ship today cover most use cases, but enterprise apps need more (color picker, date picker, rich text editor, tree grid)
- Performance polish — reducing GPU usage for animated widgets from 10% to <3%, ListView recycling, texture GC
- Platform accessibility — UIA on Windows, AT-SPI2 on Linux, NSAccessibility on macOS
- Developer experience — better docs, more examples, smoother onboarding
The hard part is behind us. The interesting part is ahead.
Try It
git clone https://github.com/gogpu/ui.git
cd ui/examples/gallery
go run .
Four design systems ship out of the box: Material Design 3, JetBrains DevTools, Microsoft Fluent, Apple Cupertino. Switch between them at runtime in the gallery example.
Backend selection via environment variable:
GOGPU_GRAPHICS_API=vulkan go run ./examples/ide/
GOGPU_GRAPHICS_API=dx12 go run ./examples/ide/
GOGPU_GRAPHICS_API=gles go run ./examples/ide/
GOGPU_GRAPHICS_API=software go run ./examples/ide/
No code changes needed.
Help Us Get There
gogpu/ui is at the stage where the architecture is proven but the user base is small. We need real-world testing to catch edge cases that no amount of 97% coverage will find.
Test it. Clone a repo, run an example, try building something with it. If it breaks — that's valuable. File an issue, and we'll fix it. If it works — that's valuable too. Tell us what you built.
Spread the word. Most Go developers don't know this exists yet. A post on Reddit, a tweet, a mention in your team's Slack — it all helps. The project grows through people who try it and talk about it, not through marketing.
Write about it. Tutorials, experience reports, comparisons, critiques — all welcome. If you build something interesting with gogpu/ui, write about the process. The ecosystem needs content from people other than us.
Contribute. You don't need to touch the render pipeline. Documentation improvements, new examples, widget ideas, accessibility testing, CI on different hardware — there's work at every level. Check CONTRIBUTING.md or just open a discussion.
The codebase is 1M+ lines of pure Go with zero CGO. The foundation is solid. What it needs now is people building on it.
GitHub · Discussions · CHANGELOG · Reddit r/golang · X/Twitter
Top comments (0)