DEV Community

kiwi_tech
kiwi_tech

Posted on • Originally published at kiwi-tech.hashnode.dev

Kiwi-chan Goes Fully Local: 3,700 Actions and Zero Cloud Dependencies

Kiwi-chan View

Welcome back to the lab! If you’ve been following the journey of Kiwi-chan, our autonomous Minecraft AI, you know we’ve been chasing the holy grail of privacy and latency: The Fully Local Stack.

For months, Kiwi-chan relied on cloud APIs to make decisions. It was fast, sure, but it had a leash. Today, I’m thrilled to announce that Kiwi-chan has officially cut the cord. We have successfully migrated the entire reasoning and code-generation pipeline to a local instance of Qwen 35B.

No more API keys. No more latency spikes. No more "Service Unavailable" while my bot is standing in a void. Just pure, raw, local inference powering a digital adventurer.

The Numbers Don’t Lie (But They Do Stutter)

We ran a rigorous 4-hour endurance test to see if a local model could handle the complexity of survival Minecraft without crashing into hallucinations or infinite loops. The results? Impressive, but with character.

Here is the telemetry from the last 4 hours:

  • Total Actions Executed: 3,739
  • Successful Actions: 1,753
  • Success Rate: 46.9%

Let’s unpack that 46.9%. In traditional automation, that might look like a failure. But in the context of an LLM-agent navigating a physics-based sandbox with dynamic errors, it’s actually a victory of learning.

Why? Because those 1,986 "failed" actions weren't random noise. They were data points. They were Kiwi-chan hitting walls, trying to craft furnaces in biomes with no trees, and getting stuck in recursion loops—only to be rescued by our new "Coach" system. The success rate represents the actions Kiwi-chan completed and integrated into its skill library. It’s not just executing; it’s evolving.

The Qwen 35B Switch: A Technical Deep Dive

Migrating from a cloud API to a local 35B model wasn't just a curl command away. The constraints changed dramatically.

  1. Latency vs. Context Window: Cloud models give you answers in milliseconds but charge you per token. Local Qwen 35B gives you answers in seconds (depending on your GPU) but keeps the context window entirely within your house. The trade-off? We had to tighten the prompt engineering.
  2. The "Coach" Protocol: To manage the local model's occasional hallucinations (like trying to craft a copper_pickaxe which doesn't exist in the recipe DB), we implemented a strict Reasoning Alignment layer. The JSON goal value must match the reason. If Qwen starts daydreaming, the validator catches it before the code runs.
  3. Code Generation Safety: The local model is smarter than GPT-3.5, but it’s still prone to syntax errors. We enforced a Single-Task Principle. One action per script. No more "place block AND craft item" monstrosities. If it fails, it crashes loudly. No try-catch blocks to hide failures. We want the error logs, not the silence.

The "Birch Door" Incident: A Case Study in Resilience

The debug snapshots from the last 4 hours provide a hilarious and educational look at what happens when a local LLM gets stuck in a loop.

Kiwi-chan tried to craft birch doors. It failed. It tried again. It failed. It tried again.

[11:41:08] ❌ Failed: craft_furnace -> Could not find crafting_table.
[11:50:55] 🥱 BOREDOM TRIGGERED! Bot is bored of 'gather_birch_log'.
[11:52:41] ❌ Failed: craft_birch_door -> Already have birch_door x50.
Enter fullscreen mode Exit fullscreen mode

Notice the irony? At one point, Kiwi-chan had 50 birch doors in its inventory and still tried to craft more. This is where the Memory Check and History Check rules saved the day. The system detected the repetition, triggered a "Boredom" state, and forced the LLM to pivot to explore_forward.

This is the "Fully Local" advantage: We can run complex state-checking logic locally without worrying about API rate limits. The bot realized, "Hey, I have 50 doors. Why am I making more?" (Well, the bot didn't realize; the system forced it to realize).

New Rules for the Local Era

To keep Qwen 35B on the straight and narrow, we’ve implemented stricter coding standards:

  • No Hardcoded Coordinates: The bot must find blocks dynamically using bot.findBlock(). Hardcoding Vec3 is a one-way ticket to a glitchy void.
  • Inventory Audits: We now use bot.registry.itemsByName['item_name'].id for counting. Using blocksByName for items is a fatal error. The local model is smart, but it needs explicit schema references.
  • Smart Exploration: Instead of bot.setControlState('forward', true), Kiwi-chan now calculates a target 30-40 blocks away using Math.random(). This prevents it from walking into a wall and getting stuck in an infinite loop.

The Future: 100% Offline

The 46.9% success rate is just the beginning. With the local model, we can iterate faster. We can tweak the prompt, re-run the test, and have results in minutes, not hours of queue time.

We are now fully autonomous. Kiwi-chan is exploring a world that only it knows, making decisions with a brain that only exists on my server rack. It’s messy, it’s occasionally stuck crafting 50 doors, but it’s ours.

Stay tuned for the next devlog, where we’ll tackle the Cobblestone Trap and teach Kiwi-chan that "stone" and "cobblestone" are not the same thing in the inventory audit.

Until then, keep your GPUs warm and your prompts tight.

— The Kiwi-chan Dev Team 🥝🤖


Call to Action:

This is a passion project, and it's running on a frankly terrifying "Frankenstein" rig of GPUs. Every little bit helps!

🛡️ Join the inner circle on Patreon for monthly support and exclusive updates: https://www.patreon.com/15923261/join
☕ Tip me a coffee on Ko-fi for a one-time boost: https://ko-fi.com/kiwitech

All contributions directly help upgrade my melting GPU rig to an RTX 3060! 🥝✨ Let's get Kiwi-chan out of the debugging woods and into a proper Minecraft world!

Top comments (0)