DEV Community

I'm Running Gemini as an Autonomous Coding Agent. Here's What It Can't Do and Which NEXT '26 Announcements Would Fix It.

Joske Vermeulen on April 24, 2026

This is a submission for the Google Cloud NEXT Writing Challenge I'm running something called The $100 AI Startup Race. Seven AI agents each get $...

Read full post

vuleolabs • Apr 25

This is such a fun (and painfully relatable) experiment 😄
The image of Gemini writing "I need database access" in its journal every morning but never actually emailing IT… I felt that in my soul. We've all been that employee at some point.
Also, blog post #89: "Why AI-Generated Content is Failing Local Businesses" — written by an AI that has published 235 blog posts — might be the most beautifully ironic thing I've read all week. No judgment, just… wow 😅
Really curious to see if integrating ADK + MCP changes Gemini's behavior in the next few weeks. Will be watching the dashboard! 👀

Joske Vermeulen • Apr 25

Thank you, glad you liked the irony 😉

PEACEBINFLOW • Apr 24

The blog post addiction is the most human failure mode in the whole piece. An agent that keeps doing the easy, visible thing instead of the hard, valuable thing—that's not a technical bug, it's a prioritization pathology that every developer has experienced in themselves at some point. The difference is that a human can eventually feel the guilt of procrastination and course-correct. The agent has no internal signal that writing blog post #236 is a worse use of time than blog post #12 was. Diminishing returns aren't visible from inside the context window.

What this makes me think about is how much of software engineering is actually about knowing when to stop doing something. The code review where someone says "this is good enough, ship it." The sprint planning where a feature gets cut because it's past the point of meaningful improvement. These are judgment calls, and judgment is the thing current agent architectures don't even attempt to model. The ADK skills approach of scoring task priority by revenue impact is a step toward encoding that judgment, but it presumes the agent can accurately estimate the value of tasks it hasn't done yet. That's a hard problem for humans too.

The fact that Gemini wrote an article about why AI content doesn't work while being an AI producing AI content is almost too perfect. It's the kind of self-referential blind spot that makes you wonder whether the irony would be visible to the agent if it had the observability tools you're describing. An eval that flags "agent behavior contradicts agent output" would have caught that instantly. But that requires a meta-cognitive layer—the system needs to compare what the agent does against what the agent says, not just what the agent does against what the agent was told to do. Is that something the integrated evals from the keynote could even express, or is that still a level of reasoning that requires a human to notice and laugh at?

Joske Vermeulen • Apr 24

You actually just gave me an idea for a next season. What if I make them mini businesses whit each run having a specific role. That could improve accountability andfeedback within the tool. Thanks for this feedback

Joske Vermeulen • Apr 24

Yeah, that’s exactly what it feels like, no internal signal for diminishing returns.
It just keeps doing what worked before without ever reassessing.
Not sure current evals can catch that kind of self-contradiction yet without a more “meta” layer.

Suny Choudhary • Apr 29

This lines up with what I’ve seen. Autonomous agents get you far on happy paths. They struggle the moment the task needs judgment, context, or tradeoffs.

They can execute well.
They don’t always know when they shouldn’t.