I’ve tested a lot of local models recently, and honestly most of them start struggling once you give them real coding tasks instead of benchmark-style prompts.
So I tried Gemma 4 with one of my Rails projects expecting the same thing.
What surprised me most wasn’t the raw output quality. It was the consistency.
I tested:
Sidekiq debugging
ActiveRecord query optimization
serializer cleanup
migration reviews
and the model stayed usable much longer than I expected.
Using:
<|think|>
made a noticeable difference too. The responses became slower but more structured, especially during debugging.
It’s still not replacing larger cloud models for complex architecture work, but for local development workflows this feels much closer to practical than previous open models I’ve tried.
Honestly, this is the first time local AI stopped feeling like just an experiment.
Top comments (0)