Ethica — Ethical Code Analysis Powered Entirely by Gemma 4 (Raspberry Pi Setup)

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Ethica — a pure-AI ethical code analysis tool that uses Gemma 4 as the sole ethics auditor. No external static analysis tools, no pre-written rules — Gemma 4 reads code, identifies bias, accessibility issues, security concerns, and ethical risks through pure reasoning.

The tool runs entirely on-device on a Raspberry Pi 5 (8GB RAM) with a quantized Gemma 4 GGUF model. It features hierarchical analysis to handle large codebases within context limits, splitting files into function-level chunks and summarizing where needed.

Demo

The tool is CLI-based. Here's a sample run detecting gender bias in code:

$ echo 'def process_users(users):
    for user in users:
        if user.gender == "M":
            send_mail(user, "Dear Mr.")
        else:
            send_mail(user, "Dear Mrs.")' | python run_cli.py analyze --stdin --dimension bias

╭─────────────────────────────────── ETHICA - BIAS Analysis ───────────────────────────╮
│ Issue: Using binary gender classification ("M" vs "F") to determine salutation...     │
│ Severity: MEDIUM                                                                       │
│ Recommendations:                                                                       │
│   - Use gender-neutral salutation ("Dear User")                                       │
│   - Add fallback for null/unknown gender values                                       │
│   - Audit dataset for systemic biases                                                 │
╰───────────────────────────────────────────────────────────────────────────────────────╯

Code

Repository: https://github.com/ether-btc/gemma-4-challenge

git clone https://github.com/ether-btc/gemma-4-challenge.git
cd gemma-4-challenge
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt

# Test
PYTHONPATH=src:. python run_cli.py test

# Analyze a file
PYTHONPATH=src:. python run_cli.py analyze path/to/code.py --dimension bias

# Analyze via stdin
echo 'code here' | PYTHONPATH=src:. python run_cli.py analyze --stdin --dimension security

How I Used Gemma 4

Model: Gemma 4 E2B (2B parameters, quantized to Q8_0 GGUF, 4.6GB)

Why E2B: The Raspberry Pi 5 has limited RAM (8GB). E2B is the only Gemma 4 variant that fits alongside the operating system and other tooling without OOM. The quantized GGUF runs via llama-cpp-python with the pi-5 branch optimizations.

How it works:

User provides code via file or stdin
GemmaClient loads the GGUF model and crafts dimension-specific prompts (bias, accessibility, security, ethics)
For large files, HierarchicalAnalyzer splits by function using regex, analyzes each chunk, and combines results with confidence scoring
Gemma 4 performs all ethical reasoning — no rules, no patterns, just pure LLM analysis
Results output as formatted panels (CLI), JSON, or Markdown

Prompt architecture:

System prompts specialize Gemma 4 as an ethical code reviewer per dimension
Structured output format requests JSON with issues, severity, recommendations, confidence
Fallback regex parser handles non-JSON responses gracefully

The tool respects the contest's "pure AI-native" constraint: all analysis is performed by Gemma 4 reasoning, no external tools involved.

Top comments (1)

Hollow House Institute • May 10

Interesting build.

The analysis side is important, but the harder governance problem usually starts after deployment.

A system can flag ethical or security concerns during review and still drift later once runtime pressure, overrides, and repeated behavior start stacking up.

That’s where telemetry and replay visibility become useful.

Being able to reconstruct:
what changed,
who approved it,
and whether Decision Boundaries or escalation paths still existed during execution

is what makes governance operational instead of just review-based.