hiyoyo

Posted on May 7

"You Got This Error Last Week" — Building an AI That Remembers Your Past Errors

#programming #rust #ai #gemini

If this is useful, a ❤️ helps others find it.

All tests run on an 8-year-old MacBook Air.

The same error appears twice. Most AI tools diagnose it twice — two API calls, same answer.

HiyokoHelper remembers. When the same error appears again, it responds instantly from cache: "💡 先日も同じケースが発生し、〇〇で解決しました"

Here's how the history cache works.

The data structure

#[derive(Serialize, Deserialize, Clone)]
pub struct HistoryEntry {
    pub error_hash: String,
    pub error_preview: String,
    pub diagnosis: String,
    pub resolved: bool,
    pub created_at: u64,
    pub hit_count: u32,
}

Stored in history.json via tauri-plugin-store. Local only, never leaves the machine.

Normalizing before hashing

Same error, different timestamps → same hash:

pub fn normalize_error(text: &str) -> String {
    let mut result = text.to_string();

    let timestamp_re = Regex::new(r"\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}").unwrap();
    result = timestamp_re.replace_all(&result, "[TIMESTAMP]").to_string();

    let line_re = Regex::new(r"line \d+").unwrap();
    result = line_re.replace_all(&result, "line [N]").to_string();

    let pid_re = Regex::new(r"\bpid[: ]\d+").unwrap();
    result = pid_re.replace_all(&result, "pid [PID]").to_string();

    result.split_whitespace().collect::>().join(" ")
}

The lookup flow

pub async fn diagnose_with_history(
    input: &str,
    api_key: &str,
    history: &mut HistoryCache,
) -> DiagnosisResult {
    let hash = error_hash(input);

    if let Some(entry) = history.get(&hash) {
        entry.hit_count += 1;
        let msg = if entry.resolved {
            format!("💡 先日も同じエラーが発生し、解決済みです。\n\n{}", entry.diagnosis)
        } else {
            format!("⚠️ このエラーは以前も発生しています（{}回目）。\n\n{}", entry.hit_count, entry.diagnosis)
        };
        return DiagnosisResult::FromHistory { diagnosis: entry.diagnosis.clone(), message: msg };
    }

    let diagnosis = call_gemini(input, api_key).await?;
    history.insert(hash, HistoryEntry {
        error_hash: hash.clone(),
        error_preview: input.chars().take(100).collect(),
        diagnosis: diagnosis.clone(),
        resolved: false,
        created_at: unix_now(),
        hit_count: 1,
    });

    DiagnosisResult::Fresh { diagnosis }
}

The "resolved" button

pub fn mark_resolved(history: &mut HistoryCache, hash: &str) {
    if let Some(entry) = history.get_mut(hash) {
        entry.resolved = true;
    }
    history.save();
}

Next time: "You had this issue and resolved it. Here's what worked."

Cache eviction

Unresolved entries older than 30 days evicted. Resolved entries kept forever.

pub fn evict_old_entries(history: &mut HistoryCache) {
    let cutoff = unix_now() - (30 * 24 * 60 * 60);
    history.entries.retain(|_, entry| {
        entry.created_at > cutoff || entry.resolved
    });
}

HiyokoHelper (OSS) → github.com/hiyoyok/HiyokoHelper
X → @hiyoyok

Top comments (3)

mote • May 7

The normalize-before-hash approach is solid â stripping timestamps and PIDs before hashing avoids the "same error, different signature" trap that kills most cache systems. I ran into the same problem on a robotics project where sensor timestamps made identical fault patterns look unique to the cache.

One thing that caught my eye: you're storing everything in a single history.json via tauri-plugin-store. That works for a desktop helper, but the eviction strategy (30-day cutoff for unresolved, keep resolved forever) is going to bite you at scale. Resolved entries accumulate indefinitely, and JSON deserialization gets expensive once you cross ~10K entries. The whole file loads into memory on every lookup.

I ended up switching to an embedded database (moteDB â a Rust-native multimodal store I'm working on) for exactly this reason. When your error history grows beyond what fits in a single JSON parse, you need indexed lookups instead of linear scans. The hash-based lookup you have is O(n) on the deserialized Vec â with an indexed store it becomes O(log n) without loading the full dataset.

Have you considered using sled or SQLite as the backing store instead of flat JSON? The resolved/unresolved split with different retention policies is a nice touch â would be even cleaner with a proper query layer.

hiyoyo • May 8

Thanks for the detailed breakdown!
One small note: the backing store here is actually a HashMap, so lookups are O(1) rather than O(n) — the full-file-load cost is real, but it's a one-time parse at startup, not per-query.
That said, your point about indefinite accumulation of resolved entries is a genuine blind spot I hadn't thought through. For a personal tool the numbers stay small, but you're right that a proper eviction or archival strategy for resolved entries would be cleaner.
moteDB looks interesting — the "multimodal" part caught my eye. Are you storing embeddings alongside structured data?

mote • May 8

One edge case worth considering: hash collisions in error fingerprinting. If two distinct errors normalize to the same signature (which can happen with complex stack traces that differ only inlining depth), your cache returns a misleading result. You mark it resolved once and the wrong diagnosis gets cached forever.

Your current normalization strips timestamps, PIDs, and line numbers. But object addresses, memory allocation patterns, and dynamic dispatch inlining are also common sources of false equivalence. The risk is low for a desktop helper with moderate traffic, but under high diversity of error inputs, you'll eventually get a collision that silently poisons the cache.

A practical fix: store the raw normalized text alongside the hash, and verify match on retrieval before returning from cache. Bloom filter as a pre-check is also cheap â if the Bloom filter says "never seen this", skip the lookup entirely.