DEV Community

zkaria gamal
zkaria gamal

Posted on

Meet StudyWithMiku 🎤📚 – Your AI Anime Study Buddy That Actually Speaks & Animates!

 # Using AI to Build a Study Buddy That Feels Like Hatsune Miku 🎤✨

ai #python #langchain #tts #opensource #vocaloid #rag #localllm

Studying alone? Boring.

Studying with an AI that reads your PDFs, explains concepts, remembers context, and talks like Miku with an anime-style voice?

Way better.

I used AI tools (Copilot, Claude, Gemini, local Ollama) to ship StudyWithMiku — an autonomous AI study companion that:

  • 📚 Reads and embeds your PDFs
  • 🧠 Answers using RAG + memory
  • 🎀 Responds with Miku’s personality
  • 🎙 Speaks with a character-style voice (not a robotic TTS)

Repo: https://github.com/zkzkGamal/StudyWithFriend
Demo (voice + behavior): https://github.com/zkzkGamal/StudyWithFriend/blob/main/demo.mp4


🚀 What’s Working Really Well

🎀 1. Personality That Actually Feels Alive

Miku’s personality is fully implemented using:

  • Prompt engineering (prompt.yaml)
  • LangGraph state + memory
  • Structured tool calling

She responds with cute, energetic vibes:

♪ Bayes time~! ★ P(A|B) = P(B|A) * P(A) / P(B) … Miku thinks this is sooo cool for studying! ^_^

She remembers context across questions. It feels less like “query → response” and more like chatting with a nerdy Vocaloid friend.


🎙 2. Custom Voice (Not Generic TTS)

Voice pipeline:

  • Coqui TTS (acoustic model)
  • DiffSinger vocoder
  • sounddevice playback

This gives anime-style character speech instead of flat robotic output.

It’s not fully expressive idol-concert mode yet — but it’s already very distinct.


📚 3. Real RAG, Not Just Chat

Drop a PDF into content/ → auto-embedded into ChromaDB in the background.

You get:

  • Smart retrieval
  • Context-aware answers
  • Tool usage (web search, open browser, system commands)
  • Error handling

It’s a proper agent — not just a wrapper over an LLM.


🧪 What’s Still Basic (Honest Section)

  • TTS is clear but not ultra-expressive yet (emotion/prosody tuning next).
  • Animations work (sparkles, terminal flair), but they could evolve into:

    • Sprite sequences
    • Mini GUI
    • Browser-based visuals
  • Voice emotion control needs better parameter tuning in DiffSinger.

The foundation is strong:
Agent ✔
Memory ✔
Voice ✔
RAG ✔

Now it’s polish time.


💡 Why I Built This

I love Vocaloid. Studying is hard. Motivation matters.

So I asked myself:

Why not turn studying into hanging out with Miku?

Cheerful voice + personality + visual feedback = more engagement.

And honestly? It works.


⚡ How AI Helped Me Ship Fast

AI wasn’t just autocomplete — it was a multiplier.

It helped me:

  • Scaffold the LangGraph agent structure
  • Fix PyTorch + protobuf dependency chaos
  • Generate 90% of the Bash installer (venv, CUDA, model downloads)
  • Iterate on Miku’s personality in minutes
  • Debug Chroma, audio pipelines, tool execution

But here’s the key:

AI gave speed.
Understanding the TTS pipeline, agent state transitions, and RAG design gave growth.

That’s where the real learning happened.


🛠 Quick Start

git clone https://github.com/zkzkGamal/StudyWithFriend.git
cd StudyWithMiku
chmod +x install.sh
./install.sh
Enter fullscreen mode Exit fullscreen mode

Edit .env (choose Ollama/local or cloud LLM), then:

source venv/bin/activate
python main.py
Enter fullscreen mode Exit fullscreen mode

Drop PDFs into content/ and start chatting.


🎯 Example Interaction

You:
Explain Bayes theorem from my stats notes.

Miku:
♪ Bayes time~! ★ P(A|B) = P(B|A) * P(A) / P(B) … Miku thinks this is sooo powerful for updating beliefs! ^_^

(Voice playback + animation trigger happens here)


🔮 Next Steps

  • Emotion-aware TTS (tag-based prosody control?)
  • Better DiffSinger tuning
  • Real animated sprites
  • Character toggle (Teto mode?)
  • Flashcards & quiz generation
  • Study session gamification

🧠 Who This Is For

If you’re into:

  • Local AI agents
  • RAG systems
  • TTS pipelines
  • Anime/Vocaloid
  • Building weird but fun AI tools

Clone it. Break it. Improve it.

I’d love feedback on:

  • How the personality feels
  • Voice quality on your machine
  • Ideas to make her more “idol-tier”

PRs and issues are very welcome.

Built with ❤️ in Cairo by Zkzk (zkzkGamal on GitHub).

Top comments (0)