Skip to content

DEV Community

Andy Triboletti

Posted on May 11

Bio-Neighbor Treatment Auditor** — an open-source, on-device "second opinion" for cancer patients

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Bio-Neighbor Treatment Auditor — an open-source, on-device "second opinion" for cancer patients.

A cancer diagnosis arrives faster than a person can process it. Within a few visits the patient has a stage, a subtype, three drugs they've never heard of, and a calendar of infusions. The honest second opinion looks like this: open seven tabs, read NCI PDQ, search ClinicalTrials.gov, cross-reference DDInter for drug interactions, check ChEMBL for mechanism overlap, look up OpenFDA FAERS for what other patients reported, normalize names through RxNorm because the hospital wrote "Taxol" and the trial database says "paclitaxel" — then synthesize all of it into questions for Monday's oncology appointment.

Most patients can't do this. And pasting a treatment plan into a cloud chatbot is a privacy non-starter — this is the most sensitive data a person owns.

Bio-Neighbor turns that Sunday-night research session into a ~90-second audit on the patient's own laptop. Enter cancer type, subtype, stage, drugs, treatments, symptoms — a multi-pass pipeline pulls from six public medical databases, runs four deterministic safety lookups, then hands the structured findings to Gemma 4 running locally via Ollama. The output is a self-contained, paginated PDF with every NCT ID, every PDQ URL, and every data-source citation listed in a References section — enough detail that a clinician could reproduce the audit by hand.

Sources synthesized:

NCI PDQ — standard-of-care guideline text from cancer.gov
ClinicalTrials.gov — recruiting and completed trials, per-modality and per-drug
DDInter — pairwise drug-drug interactions (236k pairs, Major / Moderate / Minor)
ChEMBL — flags when two drugs hit the same gene (e.g., anastrozole + letrozole both inhibit CYP19A1)
openFDA FAERS — post-market adverse-event frequencies matched against the patient's symptoms
RxNorm — brand→generic deduplication so the audit doesn't double-count fan-out

Everything runs locally. Ollama is hard-pinned to 127.0.0.1:11434. No cloud API key, no telemetry, no fallback to a remote LLM. The only external network traffic is the one-time download of public medical datasets — the same fetches a researcher would make from PubMed.

Runs from either a SwiftUI macOS GUI or a cross-platform Python CLI (macOS, Linux, Windows); both produce identical reports.

Research tool only — not medical advice.

Demo

Example audit (PDF): treatment-audit-her2-20260507-1433.pdf

Code

greenrobotllc / bio-neighbor

On-device AI Treatment Auditor for cancer plans + molecular-similarity engine. Pulls evidence from NCI PDQ, ClinicalTrials.gov, DDInter, ChEMBL, and FAERS, synthesizes with local Ollama, exports as PDF. macOS GUI + cross-platform Python CLI. Research tool only — not medical advice.

BioNeighbor

BioNeighbor App Icon

BioNeighbor is an open-source, on-device cancer-research toolkit. Turn a cancer treatment plan into a citation-grounded second opinion — locally, privately, in plain language.

Treatment Auditor: synthesize six public medical databases (NCI PDQ, ClinicalTrials.gov, DDInter, ChEMBL, RxNorm, OpenFDA FAERS) into a printable PDF audit, powered by Gemma 4 running on your own machine. No cloud, no telemetry, no patient data leaves the device.

BioNeighbor also retains its original molecular-similarity engine (FAISS + RDKit + ChEMBL) — the collaborative-filtering-inspired feature the project takes its name from — for exploring "neighbor" compounds to known drugs.

Overview

BioNeighbor is an open-source cancer-research toolkit centered on an on-device AI Treatment Auditor: describe a cancer treatment plan — disease/subtype, stage, prescribed drugs, scheduled treatments, symptoms — and the system runs a multi-pass audit across public medical data sources, synthesizes the findings via a local Ollama model, and exports a printable PDF.

Evidence sources:

NCI…

MIT licensed. One-command install: ./setup.sh.

How I Used Gemma 4

I ship with Gemma 4 26B as the default model, with E4B as a lower-RAM fallback (~9.6 GB, configurable in Settings) and 31B Dense as the recommended option for workstation-class hardware doing tougher synthesis. The same prompts and pipeline work across all three — I don't have to fork the application per device class.

Three properties of the Gemma 4 family map directly onto this problem:

1. Frontier reasoning on consumer hardware. 26B fits on a single Apple-silicon Mac with 64 GB unified memory (I develop on an M1) or a workstation with one RTX A5000-class GPU. I didn't have to trade reasoning quality for offline operation — a non-negotiable trade-off for medical synthesis.

2. The family covers the deployment spectrum. A clinic with older hardware runs the audit on E4B; a research workstation scales to 31B Dense for the hardest synthesis cases. Same prompts, same pipeline, same PDF layout.

3. Open weights are load-bearing for the offline guarantee. Closed-API models can't run on a HIPAA-firewalled hospital wing, on a plane between consultations, or in a country where uploading patient data to a US cloud is illegal. Gemma 4's open weights are the only reason this app exists.

Where Gemma sits in the pipeline:

The audit is three layers, each feeding the next:

Layer 1 — deterministic safety lookups (no LLM). RxNorm dedupe, DDInter pairwise interactions, ChEMBL target overlap, FAERS symptom-vs-reaction matching. These render as factual callouts above the AI prose. The LLM cannot contradict them; it can only contextualize them.
Layer 2 — per-source mini-summaries (Gemma 4). For each public dataset, Gemma reads the raw records and writes a short summary scoped to the patient's plan. Streams token-by-token to the macOS UI — on consumer hardware a full audit takes several minutes, which is exactly why streaming matters: the patient watches the audit assemble itself, one cited source at a time, instead of staring at a spinner.
Layer 3 — final synthesis pass (Gemma 4). A second call receives the deterministic findings + per-source summaries + the patient's plan and produces a 350–550 word audit organized into seven sections: standard-of-care alignment, interaction flags, mechanism interpretation, symptoms vs adverse-event profiles, trial landscape, staging implications, and questions to ask the oncology team.

This focused-context-per-source pattern is the whole reason this works on a laptop: Gemma 4 never has to ingest the entire corpus at once.

What I deliberately didn't use: no function calling, no structured-output mode, no JSON-schema coercion. Per-source and synthesis calls are plain NDJSON-streamed text-in/text-out against Ollama's /api/generate. The deterministic layer handles everything that must be machine-parseable; the synthesis prompt is engineered tightly enough that I don't need structured output for the prose.

Bio-Neighbor is MIT-licensed; medical datasets carry their own licenses (notably DDInter is CC BY-NC-SA 4.0 — non-commercial). Research tool. Not medical advice.

Top comments (0)

Subscribe