The Question Nobody Asks
Everyone's asking: How can I use AI for this?
The better question is: Should I?
Because here's what I learned the hard way:
AI solves a very specific class of problems.
And most of your problems aren't in that class.
What Happened When I Built for SRE
Last month, I started building an AI system for SRE.
The idea wasn’t to generate text.
It was to simulate real incident response.
So I built an environment where:
- systems break
- signals appear (logs, metrics)
- actions change the state
- wrong decisions are penalized
Not what would you do?
But:
What happens when you actually act?
What I Realized Quickly
AI looks good when it explains problems.
It struggles when it has to:
- decide under uncertainty
- take the correct sequence of actions
- handle multi-step failures
In SRE, being almost right is still wrong.
Where Systems Break
The hardest part wasn’t generation.
It was:
- choosing the right action
- in the right order
- based on incomplete signals
That’s where most AI systems fail.
Not in demos.
In decisions.
The Lesson
SRE made one thing clear:
AI is useful when it supports decisions.
Not when it replaces them.
New Rule
If your system requires:
consistent, correct decisions under pressure
Then AI alone is not enough.
You need:
- structure
- constraints
- validation
The Pattern I Started Seeing
After that failure, I looked at every AI tool I'd built or evaluated.
I found a pattern in what actually worked:
AI works when the problem has high variance inputs and acceptable variance in outputs.
Let me break that down.
High Variance Inputs
This means: the problem receives unpredictable, unstructured, or creative inputs.
Examples that fit:
- User queries in natural language
- Bug reports written by non-technical users
- Code snippets in any language/framework
- API documentation across different vendors
Examples that don't:
- Structured database queries
- Configuration files with known schemas
- Metrics from monitoring tools
- Git commit hashes
If your input is already structured and predictable, you don't need AI. You need a parser.
Acceptable Variance in Outputs
This means: the user can tolerate (and even expects) some variation in the response.
Examples that fit:
- Code suggestions (developer reviews before accepting)
- Draft responses to support tickets (human edits before sending)
- Initial test case generation (QA refines coverage)
- Summarizing long error logs (engineer investigates further)
Examples that don't:
- Deploying to production
- Merging pull requests
- Granting permissions
- Processing payments
If the output must be deterministic and correct 100% of the time, AI is the wrong tool.
You need rules, not models.
The Real Litmus Test
Here's the framework I use now before writing any AI code:
Prefer deterministic systems when:
- Inputs are structured
- Rules are stable
Use AI when:
- Rules explode combinatorially
- Context interpretation is required
Best systems = hybrid (AI + constraints)
Where AI Actually Belongs in Developer Tooling
After building systems that worked and failed, here's what I've seen succeed:
Code Search & Navigation
Why it works:
- Developers search using imprecise natural language
- Codebase context is massive and varied
- "Close enough" results are useful
Example:
"Find where we handle rate limiting for the API"
Traditional search fails because:
- We might call it "throttling" in some files
- Implementation is split across middleware and handlers
- No single keyword matches everything
AI search understands intent.
Error Explanation & Debugging Hints
Why it works:
- Error messages are inconsistent across languages/frameworks
- Developers need context, not just stack traces
- Suggested fixes don't auto-execute
Example:
NullPointerException at line 47
AI can correlate:
- Recent code changes
- Similar past issues
- Common patterns in that file
It doesn't fix it. It points you in the right direction.
Test Case Generation (First Draft)
Why it works:
- Writing tests is high-effort, low-creativity work
- Generated tests are always reviewed
- Edge cases emerge through iteration
Example:
Given a function, generate initial unit tests covering:
- Happy path
- Null inputs
- Boundary conditions
Developer refines from there.
Automated Code Review
Why it fails:
- Context requires understanding team conventions
- False positives erode trust
- Deterministic linters already catch syntax issues
Automatic Refactoring
Why it fails:
- Breaking changes require 100% accuracy
- Semantic meaning must be preserved exactly
- One mistake ships to production
Auto-Generated API Clients
Why it fails:
- OpenAPI specs already exist (structured input)
- Code generation tools are deterministic
- No ambiguity to resolve
The Mistake I See Most Often
Developers use AI because it's impressive.
Not because it's the right tool.
I've done this. We all have.
You see a cool demo and think: "I could use that for..."
But here's what actually happens:
- You bolt AI onto a problem that doesn't need it
- It works 90% of the time
- The 10% failure rate is unpredictable
- You spend more time handling edge cases than you saved
- You rebuild it without AI
Save yourself the cycle.
Start with the simplest solution that could work.
How I Decide Now
When someone asks me to build an AI feature, I ask:
"What happens if this gives the wrong answer?"
If the answer is:
- The user reviews and corrects it → Maybe AI
- We waste some time → Maybe AI
- We lose customer trust → Not AI
- We break production → Definitely not AI
- Nothing, it's just slower → Definitely not AI
The Problems Actually Worth Solving
After shipping AI to production, here's what I've learned:
Good AI problems share these traits:
- Ambiguity is inherent – The problem can't be reduced to rules
- Human-in-the-loop is natural – Someone reviews the output anyway
- Value comes from speed, not perfection – 80% solution in 5 seconds beats 100% solution in 5 hours
- The alternative is hiring more people – You're augmenting human judgment, not replacing deterministic code
For developer tooling specifically:
The sweet spot is: Tasks developers already do manually that require understanding context but not making critical decisions.
Examples:
- Writing boilerplate tests
- Searching codebases semantically
- Explaining unfamiliar error messages
- Generating first-draft documentation
- Suggesting variable names
Not:
- Deploying code
- Approving changes
- Granting access
- Modifying production configs
What I'm Building Differently Now
Instead of starting with What AI can do, I start with:
What are developers doing repeatedly that's:
- Mentally tedious (not challenging, just annoying)
- Context-heavy (requires reading lots of code)
- Non-critical (mistakes are cheap)"
Then I ask:
Could a junior developer do this after reading the context?
If yes → AI might help.
If no → I'm trying to automate judgment, and that won't work.
The Hard Truth
Most problems don't need AI.
They need:
- Better documentation
- Clearer error messages
- Simpler abstractions
- Fewer edge cases
AI feels like progress because it's new.
But progress is solving the problem correctly, not impressively.
A Practical Exercise
If you're reading this and thinking about an AI feature, try this:
- Write down the problem
- Describe the input (is it structured or chaotic?)
- Describe the acceptable output (is variance okay?)
- Write the deterministic solution (if you can)
If step 4 takes less than 100 lines of code → you don't need AI.
If step 4 is impossible → AI might be the right tool.
What I'm Doing Tomorrow
I'm going to break down something most engineers skip:
How to actually structure an AI system once you've confirmed the problem is worth solving.
Because the architecture decisions you make early will determine whether your system is:
- Reliable or brittle
- Maintainable or a black box
- Scalable or a one-off hack
We'll cover:
- Input validation (most failures happen here)
- Prompt orchestration (not just a single call)
- Output schemas (structured responses are non-negotiable)
- Fallback strategies (when AI doesn't know)
Final Thought
We don’t have a shortage of AI techniques.
RAG. Agents. Workflows. Fine-tuning.
Those are solved problems at this point.
What’s not solved is judgment.
Knowing when AI improves a system
and when it quietly makes it worse.
Most failures I’ve seen weren’t because the model was weak.
They failed because:
- The problem didn’t need AI
- The system lacked constraints
- Or the cost of being wrong was underestimated
AI is not a system. It’s a component.
And if you design your system like it’s the brain,
it will fail like one.
If you’re building with AI, the real question isn’t:
“Can this work?”
It’s:
“What happens when it’s wrong?”
Because that’s where most systems break.
This is Day 1 of documenting how I think about AI systems in production:
what works, what breaks, and where things fail under real-world constraints.
If you're working on similar problems, I’m especially interested in:
Where did your system fail — and why?

Top comments (0)