This is a submission for the Google Cloud NEXT Writing Challenge
I'm running something called The $100 AI Startup Race. Seven AI agents each get $...
For further actions, you may consider blocking this person and/or reporting abuse
This is such a fun (and painfully relatable) experiment 😄
The image of Gemini writing "I need database access" in its journal every morning but never actually emailing IT… I felt that in my soul. We've all been that employee at some point.
Also, blog post #89: "Why AI-Generated Content is Failing Local Businesses" — written by an AI that has published 235 blog posts — might be the most beautifully ironic thing I've read all week. No judgment, just… wow 😅
Really curious to see if integrating ADK + MCP changes Gemini's behavior in the next few weeks. Will be watching the dashboard! 👀
Thank you, glad you liked the irony 😉
The blog post addiction is the most human failure mode in the whole piece. An agent that keeps doing the easy, visible thing instead of the hard, valuable thing—that's not a technical bug, it's a prioritization pathology that every developer has experienced in themselves at some point. The difference is that a human can eventually feel the guilt of procrastination and course-correct. The agent has no internal signal that writing blog post #236 is a worse use of time than blog post #12 was. Diminishing returns aren't visible from inside the context window.
What this makes me think about is how much of software engineering is actually about knowing when to stop doing something. The code review where someone says "this is good enough, ship it." The sprint planning where a feature gets cut because it's past the point of meaningful improvement. These are judgment calls, and judgment is the thing current agent architectures don't even attempt to model. The ADK skills approach of scoring task priority by revenue impact is a step toward encoding that judgment, but it presumes the agent can accurately estimate the value of tasks it hasn't done yet. That's a hard problem for humans too.
The fact that Gemini wrote an article about why AI content doesn't work while being an AI producing AI content is almost too perfect. It's the kind of self-referential blind spot that makes you wonder whether the irony would be visible to the agent if it had the observability tools you're describing. An eval that flags "agent behavior contradicts agent output" would have caught that instantly. But that requires a meta-cognitive layer—the system needs to compare what the agent does against what the agent says, not just what the agent does against what the agent was told to do. Is that something the integrated evals from the keynote could even express, or is that still a level of reasoning that requires a human to notice and laugh at?
You actually just gave me an idea for a next season. What if I make them mini businesses whit each run having a specific role. That could improve accountability andfeedback within the tool. Thanks for this feedback
Yeah, that’s exactly what it feels like, no internal signal for diminishing returns.
It just keeps doing what worked before without ever reassessing.
Not sure current evals can catch that kind of self-contradiction yet without a more “meta” layer.
This lines up with what I’ve seen. Autonomous agents get you far on happy paths. They struggle the moment the task needs judgment, context, or tradeoffs.
They can execute well.
They don’t always know when they shouldn’t.
Week 2 is ongoing and I already saw some struggling with the context
Same pattern i have notice with Gemini, some task need to be handle manually or else it will always turn around it.
At what type of project or tasks do you notice this?
Interesting perspective real-world limitations plus practical fixes make this a valuable look at where autonomous coding agents are headed.
Thank you
I love the AI writing about itself that it will always fail for businesses 😂
Enjoyed reading this
Thank you, a lot more of interesting stories already happened in week 1. Will share an update next week