Joske Vermeulen

Posted on May 11

I Turned Any GitHub Repo Into a Playable Dungeon: Gemma 4 Finds Real Bugs and Turns Them Into Monsters

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Codebase Dungeon: paste any GitHub repo URL and Gemma 4 reads your actual source code, finds real security vulnerabilities and bugs, then turns them into a playable text adventure dungeon.

Files become rooms
Real bugs become monsters (with creative names like "The Hardcoded Sentinel" or "The CSV Injection Imp")
You fix the bugs to clear rooms: wrong answers cost HP, correct fixes earn XP
Gemma 4's multimodal vision analyzes your app's screenshots and creates UX-themed rooms
At the end, you get a downloadable code review report: a genuinely useful security audit disguised as a game

It's not just a game. The output is an actionable code review that developers can use to fix real issues in their codebase.

Demo

🎮 Play it live →

Try the pre-loaded codebases for instant gameplay, or paste any public GitHub repo URL.

Code

🔗 github.com/aimadetools/codebase-dungeon

Key Implementation: Multimodal + 128K Context + Structured Output in One Call

// Send code + screenshot to Gemma 4: all three capabilities at once
const parts = [
  { text: prompt },  // Contains full source files (128K context)
  { inlineData: { mimeType: 'image/png', data: screenshotBase64 } }  // Multimodal
];

const res = await fetch(GEMMA_API_URL, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    contents: [{ role: 'user', parts }],
    generationConfig: {
      responseMimeType: 'application/json',        // Force JSON
      responseJsonSchema: FIRST_ROOM_SCHEMA,       // Structured output
      maxOutputTokens: 800,
      temperature: 0.6
    }
  })
});
// Result: clean JSON with room name, bug description, correct fix,
// victory narrative: all informed by both code AND screenshot

The Schema That Solves Gemma 4's Thinking Problem

const FIRST_ROOM_SCHEMA = {
  type: 'object',
  properties: {
    dungeonName: { type: 'string' },
    id: { type: 'string' },           // Exact file path
    name: { type: 'string' },          // Creative room name
    monsterName: { type: 'string' },   // Bug as a monster
    bugDescription: { type: 'string' },// Real bug found in code
    correctFix: { type: 'string' },    // The answer (for deterministic scoring)
    victoryNarrative: { type: 'string' },
    colorTheme: { type: 'string' },    // Extracted from screenshot
    narrative: { type: 'string' },     // References actual UI elements
    choices: { type: 'string' }        // 5 options, randomized
  },
  required: [/* all fields */]
};
// With this schema: 99%+ parse rate, zero thinking tokens, perfect JSON
// Without it: ~50% failure rate, 140+ wasted tokens per call

Zero-Cost Gameplay: All Logic Pre-Computed

// During gameplay: NO API calls, instant responses
app.get('/api/action', (req, res) => {
  const room = session.dungeon.rooms.find(r => r.id === session.currentRoom);
  const isCorrectFix = action.toLowerCase().trim() === room.correctFix.toLowerCase().trim();

  if (isCorrectFix) {
    // Instant victory: narrative was pre-generated
    session.xp += 20;
    narrative = room.victoryNarrative;
  } else if (isMove) {
    // Instant room transition: narrative was pre-generated
    narrative = targetRoom.roomNarrative;
  } else {
    // Instant wrong answer: no AI needed
    session.hp -= 10;
    narrative = `The ${room.monster.name} shrugs off your attack. -10 HP.`;
  }
  // Total API calls during gameplay: 0
});

How I Used Gemma 4

I chose Gemma 4 31B Dense because this project requires three capabilities that only this model provides among open models:

1. 128K Context Window: Entire Codebase Analysis

Gemma 4's 128K context window means we can feed entire repositories into a single prompt: full file contents, not just filenames or snippets. The model reads complete source files and reasons about interactions between them, finding cross-file vulnerabilities like "this function in auth.js is called without validation in routes.js."

The live demo limits file count for cost efficiency (it runs 24/7 for free), but the architecture supports loading full repos with dozens of files in a single Gemma call. No other open model has the context window to hold an entire codebase and reason about it holistically.

2. Native Multimodal: Design Comprehension, Not Just Color Detection

When a repo contains UI screenshots, Gemma 4 looks at them and demonstrates genuine design comprehension: understanding what the app does, identifying specific UI elements, and finding real accessibility issues.

Here's what Gemma 4 generated after seeing a SchemaLens Chrome Store screenshot:

"You step into a dim, cavernous room where two massive stone tablets-Schema A and Schema B-loom before you. In the depths of the footer of Tablet A, four glowing blue runes of 'Load sample' flicker with identical intensity, offering no clue which path you have already trodden. Across the gap, in the footer of Tablet B, a lone rune 'Copy from A & modify' pulses with a pale, spectral lilac hue, clashing with the bold violet of the 'Compare Schemas' altar above."

From a single screenshot, Gemma identified:

The two schema editor panels by name ("Schema A" and "Schema B")
The "Load sample" links in the footer and their identical styling
The "Copy from A & modify" link with its inconsistent color
The "Compare Schemas" button's purple gradient
A real UX issue: inconsistent visual hierarchy between primary and secondary actions

This isn't color detection: it's a genuine UX audit from a screenshot. The monster ("The Contrast Ghoul") represents the accessibility anti-pattern, and the player must fix it to clear the room. The actual screenshot is displayed in the game's bug panel so players can see exactly what Gemma analyzed.

3. Structured JSON Output: Solving Gemma 4's Thinking Problem

Gemma 4's "thinking mode" is notoriously hard to disable: developer forums are full of people struggling with it. The model outputs internal reasoning before answering, consuming tokens and breaking JSON parsing. thinkingLevel: "MINIMAL" reduces it but doesn't guarantee structured output.

The real solution: responseJsonSchema in the Gemini API's generation config. It not only forces clean JSON output but also effectively bypasses the thinking behavior entirely: no thinking tokens, no wasted output, just structured data.

generationConfig: {
  responseMimeType: 'application/json',
  responseJsonSchema: { /* your schema */ }
}

This is documented for Gemini models, but the official Gemma 4 capabilities page doesn't list it as a supported feature. We discovered it works perfectly with Gemma 4 31B through the same API: taking our parse reliability from ~50% to 99%+.

Zero API Calls During Gameplay

Here's the key architectural insight: Gemma does all the work upfront, then gameplay is instant.

The generation flow:

First room: Gemma analyzes code + screenshot, generates room with narrative, choices, and correct answer (~10s)
Game starts: player can immediately play the first room
Background batches: remaining rooms generate in parallel while the player is already playing (~15s)
Cached forever: once generated, the dungeon is saved. Return visits are instant.

During actual gameplay (choosing answers, navigating rooms), there are zero API calls:

Wrong answers: instant feedback (0ms, pre-computed)
Correct answers: instant pre-generated victory narrative (0ms)
Room navigation: instant pre-generated room descriptions (0ms)

This means cached repos (the presets in the demo) provide a completely free, instant gaming experience. Gemma 4 does all the heavy lifting during generation, then the game runs purely on pre-computed data.

The Downloadable Code Review Report

When you clear the dungeon (or die trying), you get a downloadable markdown report listing every bug found:

File location
Bug description
Vulnerable code snippet
How to fix it
The correct action

This isn't a gimmick: it's an actionable security audit that developers can use to fix real issues. The game makes code review engaging; the report makes it useful.

Why Gemma 4 and Not Another Model?

Capability	Gemma 4 31B	GPT-4o	Other Open Models
128K context (entire repos)	✅	✅	❌ (8K-32K)
Native multimodal (screenshots)	✅	✅	❌
Structured JSON schema	✅	✅	❌ (unreliable)
Cost per game	$0.005	$0.09	Varies
Open model	✅	❌	✅

Gemma 4 delivers the same multimodal + long-context capability as GPT-4o at 18x lower cost: while being fully open. For a game that needs to run 24/7 for free, this makes all the difference.

Real Bugs Found

Here are actual bugs Gemma 4 found in real codebases:

Hardcoded admin password in plain text (const ADMIN_PASSWORD = 'schemalens-admin-2026')
CSV injection vulnerability: unescaped fields that could execute formulas in Excel
Missing request body validation: server crashes on empty POST requests
Exposed environment variables in health check endpoints
Base64 tokens without HMAC: anyone can forge authentication tokens
Memory leak in rate limiter: Map grows unbounded without TTL eviction

These aren't hallucinated: they're real issues in real code, found by Gemma 4 reading the actual source files.

DEV Community