southy404

Posted on Apr 9 • Edited on Apr 11

I’m building a local AI desktop companion that sees your screen — and you can help shape it

#opensource #ai #rust #react

Most AI tools feel disconnected.

They don’t see your screen.
They don’t understand what you're doing.

So I built one that does.

Meet OpenBlob

An open-source, local-first desktop AI companion for Windows that doesn’t just respond — it lives on your desktop.

👉 GitHub: https://github.com/southy404/openblob

It can:

understand what app you’re using
analyze screenshots
help inside games, apps, and browsers
react visually with an animated companion
and yes… even play hide and seek with you

The problem with current AI assistants

Most tools today are:

cloud-dependent
context-blind
static
not fun to use

They don’t feel like part of your system.

🧠 It understands context

OpenBlob looks at:

active window
app name
window title

So if you’re in a game, it knows.
If you're debugging, it adapts.

This is where things start to feel different.

🖼 It can see your screen

You can take a screenshot and it will:

extract visible text
detect what you're looking at
generate a real search query
explain what's going on

Screenshot → OCR → context → reasoning → answer

Still a bit rough — but already very usable.

🎮 It actually helps inside games

Instead of:

alt-tab → google → guess

You can:

screenshot
let it detect the game
get a real answer

This alone changes how you play.

🤖 Multi-model AI (local-first)

Runs via Ollama with:

text models
vision models
fallback system

No cloud required.

🎨 It feels alive

The companion:

has moods (idle, thinking, love, sleepy)
reacts to interaction
can be “petted”
dances when music is playing

Small details, big difference.

🎮 The weird part (my favorite)

Hide and Seek mode

You can literally say:

“let’s play hide and seek”

And it will:

hide somewhere on your screen
peek occasionally
wait until you find it

Sounds dumb.

Feels surprisingly real.

⚡ New UI (WIP)

CTRL + SPACE to open
floating companion
instant interaction

Inspired by tools like Raycast / Arc — but alive.

⚠️ still slightly buggy

🧪 Screenshot assistant (work in progress)

fast snipping
instant processing
contextual answers

Works — but not perfect yet.

Why open source?

Because this shouldn’t belong to one company.

This kind of system should be:

transparent
hackable
community-built

Philosophy

local-first
context > prompt
playful + useful
build in public

Current state

Early stage.

evolving fast
sometimes buggy
lots of experiments

If you want to join

This project is wide open.

You can:

contribute features
improve UI
experiment with AI
build plugins

👉 https://github.com/southy404/openblob

Final thought

I don’t think the future of AI is chat.

I think it’s something that:

lives with you, understands your environment, and evolves

That’s what I’m trying to build.

Top comments (16)

Phil Lee • Apr 29

Love it - i vibecoded it to connect to my llama & whisper on my strix halo, search via searxng and automatically taking screenshot of my whole desktop and following my question (what do you see in the middle of the screen). So it acts like a chatgpt with the possibility to see my desktop, all purely by voice. Next up would be interacting with windows, but thats a whole lot more complex to vibecode i guess.

The snipping tool felt a bit cumbersome and the browsing functionality is not there yet. Its just a bit slow atm but that might be gemma 4, which is not the fastest for quick questions.

southy404 • Apr 30

Hey, thanks a lot! That sounds really interesting.

There’s still quite a bit to do in OpenBlob’s core and it needs to get more stable, but a companion that understands context and can see your desktop can be really helpful - that’s exactly what I’m going for.

I also recently added some Windows control (like restart, shutdown with safety checks, etc.), which is pretty cool - especially since I can trigger it via Discord or Telegram from my phone.

Your setup sounds really nice, especially the voice + full desktop awareness part.

Phil Lee • Apr 30

I actually would like a kind of wingman for gaming, like tell me where to go or what to do from here. For that, i plan to train some game specific loras. I dont care too much about controlling my computer but i can see the usecase for that. The Telegram implementation sounds nice.

Anyway, thanks for making it open source - its fun to play around with and the blob itself is already looking good, especially if he thinks :D

southy404 • Apr 30

Yeah, I totally get you - a kind of gaming wingman is a really nice use case. The screen capture / snipping part is still in development and a bit buggy right now, so that’s definitely something that needs improvement.

The idea for the first versions is more like:
detect the game via Windows context - then search online (e.g. quests, guides, next steps)

For deeper understanding (like matching what’s actually on screen more precisely), I’m thinking this will probably go more into a plugin direction - since that might require going beyond local models and using external services for better image matching / analysis.

Really appreciate you trying it out though - and glad you like the blob 😄

Phil Lee • Apr 30 • Edited

I had no problem to ask it "tell me whats in the middle of my screen" where screen was one of the trigger words for taking a full desktop screenshot. Or asking it to research smth online. For testing purposes, i put a rose in the middle of the screen with lots of other stuff around it and it never failed on that. Might be the rather slow but smart Gemma 4 26b model though as well. The problem for me is rather, that context above 4k slows it massively down (i assume theres smth wrong, like sending new timestamps or smth everytime).

But anyway, its work in project... and to get smth to work is part of the fun, isnt it? ^_^

southy404 • Apr 30

Yeah makes sense, I get the game-specific approach.

For now the focus is more on a general system that works across many games without per-game training. The plan is to use something around an ~8B model so it can run locally for more people, even if that comes with some limitations.

The idea is detecting the game via Windows context, then combining a screenshot with OCR and using that to search for relevant info like quests, guides or next steps - basically figuring things out from what’s on screen plus external knowledge.

You’re probably right about the context slowdown too, similar behavior shows up when too much history gets sent each time.

And yeah 😄 getting things to work (and breaking them along the way) is part of the fun.

Valentin Monteiro • Apr 10

The local-first + context-aware combo is the right bet. Most AI tools treat your desktop like it doesn't exist and wait for you to copy-paste stuff into a chat window.

One thing that could push this further: web context. I've been building an AI-driven browser that navigates and extracts content autonomously. Your blob sees the desktop, mine sees the web. Plugging the two together means the companion could actually go fetch what you need based on what it sees you doing, instead of just reacting to it.

Happy to explore a plugin integration if you're open to it.

southy404 • Apr 10

That’s actually a really interesting direction.

OpenBlob already does some browser interaction via Chrome/Edge (remote debugging), so it can navigate, click, type and read page context — but it’s still more command-driven than truly autonomous.

What you’re describing goes a step further:
not just controlling the browser, but understanding and navigating the web on its own.

Combining that with desktop-level awareness would be pretty powerful:
seeing what you’re doing locally, deciding what’s needed and fetching it from the web.

We’re planning a plugin / capability system, so something like this could fit really well as an extension layer.

Definitely open to exploring that 👀

Valentin Monteiro • Apr 10

I followed you on GitHub, there’s a way to reach you out ?

southy404 • Apr 10

I’ve added my email and Discord on my GitHub profile, feel free to reach out there

mote • Apr 10

This is a fascinating project! On-device AI companions need a data layer that goes beyond just storing conversation history — you also need persistent memory of past sessions, learned preferences, and efficient retrieval of relevant context.

We built moteDB (Rust-native embedded multimodal DB) with exactly this in mind: vector + time-series + structured data in one zero-dependency engine that runs on the same machine as your AI agent. No cloud required, no separate database server. Would love to follow this project and compare notes on the on-device data architecture. Best of luck with the build!