This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
Ethica — a pure-AI ethical code analysis tool that uses Gemma 4 as the sole ethics auditor. No external static analysis tools, no pre-written rules — Gemma 4 reads code, identifies bias, accessibility issues, security concerns, and ethical risks through pure reasoning.
The tool runs entirely on-device on a Raspberry Pi 5 (8GB RAM) with a quantized Gemma 4 GGUF model. It features hierarchical analysis to handle large codebases within context limits, splitting files into function-level chunks and summarizing where needed.
Demo
The tool is CLI-based. Here's a sample run detecting gender bias in code:
$ echo 'def process_users(users):
for user in users:
if user.gender == "M":
send_mail(user, "Dear Mr.")
else:
send_mail(user, "Dear Mrs.")' | python run_cli.py analyze --stdin --dimension bias
╭─────────────────────────────────── ETHICA - BIAS Analysis ───────────────────────────╮
│ Issue: Using binary gender classification ("M" vs "F") to determine salutation... │
│ Severity: MEDIUM │
│ Recommendations: │
│ - Use gender-neutral salutation ("Dear User") │
│ - Add fallback for null/unknown gender values │
│ - Audit dataset for systemic biases │
╰───────────────────────────────────────────────────────────────────────────────────────╯
Code
Repository: https://github.com/ether-btc/gemma-4-challenge
git clone https://github.com/ether-btc/gemma-4-challenge.git
cd gemma-4-challenge
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
# Test
PYTHONPATH=src:. python run_cli.py test
# Analyze a file
PYTHONPATH=src:. python run_cli.py analyze path/to/code.py --dimension bias
# Analyze via stdin
echo 'code here' | PYTHONPATH=src:. python run_cli.py analyze --stdin --dimension security
How I Used Gemma 4
Model: Gemma 4 E2B (2B parameters, quantized to Q8_0 GGUF, 4.6GB)
Why E2B: The Raspberry Pi 5 has limited RAM (8GB). E2B is the only Gemma 4 variant that fits alongside the operating system and other tooling without OOM. The quantized GGUF runs via llama-cpp-python with the pi-5 branch optimizations.
How it works:
- User provides code via file or stdin
-
GemmaClientloads the GGUF model and crafts dimension-specific prompts (bias, accessibility, security, ethics) - For large files,
HierarchicalAnalyzersplits by function using regex, analyzes each chunk, and combines results with confidence scoring - Gemma 4 performs all ethical reasoning — no rules, no patterns, just pure LLM analysis
- Results output as formatted panels (CLI), JSON, or Markdown
Prompt architecture:
- System prompts specialize Gemma 4 as an ethical code reviewer per dimension
- Structured output format requests JSON with issues, severity, recommendations, confidence
- Fallback regex parser handles non-JSON responses gracefully
The tool respects the contest's "pure AI-native" constraint: all analysis is performed by Gemma 4 reasoning, no external tools involved.
Top comments (1)
Interesting build.
The analysis side is important, but the harder governance problem usually starts after deployment.
A system can flag ethical or security concerns during review and still drift later once runtime pressure, overrides, and repeated behavior start stacking up.
That’s where telemetry and replay visibility become useful.
Being able to reconstruct:
what changed,
who approved it,
and whether Decision Boundaries or escalation paths still existed during execution
is what makes governance operational instead of just review-based.