Pick a feature you shipped in the last 30 days. Now answer one question: what did you expect to happen, in writing, before you shipped it?
Most developers and founders can't answer this. Not because they don't care. Because the answer was never written down.
That's the gap this post is about.
The numbers that should bother you
I run a small SaaS on the side called Blazeway. The first time I looked at my own landing page I had 69 pageviews and 2 signups. A 2.9% conversion rate. Industry benchmarks for B2B SaaS landing pages sit between 2 and 5%, so I was on the low end of normal. Normal feels safe. Normal is usually where insight goes to die.
A few weeks later I ran an A/B test on my hero copy. Two headlines:
A: Your product is trying to tell you something, and Blazeway turns every A/B test into compounding knowledge.
B: Every A/B test gives you a winner. None of them tell you what's actually broken in your conversion.
The instinct in SaaS copy is to lead with transformation. Aspirational. Show the destination. That was Variant A. Variant B won.
Without the test I would have shipped A. I would also have written a confident tweet about why leading with transformation is the right move. Wrong, and I would never have known.
The cost of not running experiments is not the lost conversions. It is the lost calibration.
Microsoft has run thousands. Their hit rate is 1 in 3.
Ronny Kohavi led experimentation at Microsoft and Amazon and now publishes regularly on this. His public number, repeated in his book Trustworthy Online Controlled Experiments and in talks: across well-resourced product teams, only about one in three A/B tests moves the target metric in the predicted direction. At Bing specifically, the rate for ideas that made it into a test was closer to 10 to 20%.
Read that again. Microsoft. Statisticians, tenured PMs, decades of telemetry. Two thirds of their bets miss.
What does that mean for your team? If a team with that much pedigree is wrong twice as often as they are right, you are wrong more. You just don't know how often, because you don't measure it.
You ship things. You move on. You assume they worked. The metric drifts up or down for a hundred unrelated reasons: a competitor changed pricing, a holiday landed weirdly, an SEO algorithm rolled. You attribute the drift to the last thing you remember shipping, because the human brain is a narrative machine, not a measurement device.
This is not an attack on intuition. Intuition is fine. Calibrated intuition is better. You only get calibrated by being publicly wrong, on the record, with enough reps.
Three reasons people skip experiments, and why each is wrong
"I don't have enough traffic."
You don't need statistical significance to learn. You need a hypothesis written down before you ship. I ran a test on a navbar label with 89 visits in a week. Not significant. Still useful, because the hypothesis was specific: removing the word Beta should push conversion above 1.7%. If it doesn't, the label wasn't the problem, and I move on without burning a month guessing.
The point isn't power. The point is committing to an expectation.
"It's overhead I don't have time for."
Writing a hypothesis takes ten minutes. Observation, mechanism, prediction, threshold. The overhead is what forces you to confront whether you actually know what you're doing or whether you're shipping vibes.
The real time sink is the alternative. Six months of vague I think the new copy helped, with no way to know what to repeat.
"We just need to ship."
Shipping is not the goal. Learning is. Shipping is how you generate evidence about what works. If you don't capture the evidence, you have shipped expensive opinions.
What you actually lose
When you don't experiment, three things degrade silently.
Decision quality. Every product decision compounds on the last one. If your base hit rate on things I think will work is 30% and you treat all your shipped features as if they worked, you are stacking your next decision on top of multiple failed assumptions. Six months later you are optimizing a local maximum that was never there.
Onboarding speed. Try writing a doc that explains why your homepage is the way it is, for a new hire. If the answer is "the founder thought this was best," you don't have a product, you have a memoir.
Investor signal. Investors who have seen real product cycles can smell unmeasured product changes inside the first ten minutes of a call. They ask why does this work? The answer is either a story with numbers attached, or a vibe. The first one closes rounds.
What experimenting does not give you
Honesty cuts both ways.
Experimenting does not give you certainty. It gives you better priors.
It does not protect you from being wrong. It tells you when you were, faster.
It does not replace product taste. It corrects it over time.
If you're already a brilliant product thinker, experiments make you slightly more right than you would have been. If you're not, they make you less wrong, which is what most of us actually need.
A 30-minute starter
Pick one decision you are about to ship. Write four lines:
Observation: what did I see that made me want to ship this?
Mechanism: why do I think this change will help?
Prediction: what will the metric look like if I'm right?
Threshold: at what point would I admit the change didn't work?
That is it. You do not need a tool yet. You need the discipline. The tool comes when you find yourself doing this five times a week and losing track of what you predicted last month.
When that moment hits, that's why I'm building Blazeway. A journal where every experiment lives end to end: hypothesis, run, outcome, learning. Privacy-first by architecture (no cookies, no consent banner). Five-minute setup. Built by a solo founder for solo founders who are tired of I think it worked.
My own experiments run live on the landing page, including the ones that failed. If you want to see what writing things down before you ship looks like in practice, that is the most honest demo I can give you.
The next test you run is going to teach you something. The question is whether you will remember it in six months, or whether it will dissolve into the same fog as the last fifty.
Write it down.
If you want to see my journey on experimenting, follow me on X: @deliverhonestly
Top comments (0)