Ian Johnson

Posted on May 15

It werks!

#webdev #softwareengineering #testing #backend

Someone on your team says "it works" and there's a moment of relief, maybe even satisfaction. The deploy went out. The bug is gone. The new feature is up. Whatever they were wrestling with, it works.

It's worth pausing on that phrase, because "it works" is doing a lot of hidden work itself. Working software is genuinely valuable — much better to have something that does the thing than something that doesn't. But "works" is a weak property when you don't have the others next to it. Reliable. Robust. Predictable. Dependable. Tested. Each of those is something stronger than "I ran it just now and it didn't break." Each of them is a claim about what the software will keep doing, under variation, under load, under change, when nobody is watching.

When you have the observation without those properties, what you have is software that werks. It sounds the same as "works" when you say it out loud. It'll pass casual inspection, it'll satisfy the demo, it'll close the ticket. But it isn't correct, in the same way "werks" isn't correctly spelled. It happens to produce the right output for the inputs it has seen, by some combination of luck, coincidence, and undocumented assumption. Push on it a little, and it stops.

What incidentally-working code looks like

The clearest examples are the ones where two bugs cancel each other out: a function that computes the wrong answer, fed into another function that, by coincidence, expects exactly that wrong answer. Fix either one in isolation and the system breaks. Nobody knows this, because no test ever exercised the boundary; the only thing keeping the lights on is that nobody has touched either function in a while.

There's a softer version that's much more common. A function processes the inputs you happen to feed it today and produces correct outputs. The inputs you haven't fed it (slightly different formats, edge cases, sizes outside what you've seen) would produce silently wrong outputs. Currency math that's right for USD and broken for JPY. Date handling that's right in your timezone and wrong everywhere else. A regex that matches the strings you tested and matches half the URLs in production by accident. A query that returns the right rows in the right order because the database happens to have a particular index, and one day someone drops the index for an unrelated reason.

And then there are the timing cases. A race condition that almost always loses, until it doesn't. An eventual-consistency window that's almost always shorter than the next read, until traffic spikes and it isn't. A retry that almost always succeeds within three attempts, until the downstream service has a bad day.

All of this is "it works." None of it is reliable.

The danger is what you build on top

A piece of incidentally-working code is a small problem. The bigger problem is what happens next, which is that someone builds on top of it. They don't know it's incidental. From their perspective it looks like a normal function returning a normal value. So they call it from another function, which calls it from another function, and now the assumption that the original code happened to satisfy is load-bearing for half the system.

The longer this goes on, the more expensive the eventual correction gets. By the time you discover that the foundation has a property nobody intended, you have to fix the foundation and update everything that came to depend on the accidental property. If your discovery happened because of a production incident, you're doing that work while customers are watching.

This is how codebases acquire that distinctive quality where nobody wants to touch certain modules. It isn't that the modules are complicated...well, they often are, but that's a symptom. It's that nobody is sure which parts are doing what they look like they're doing and which parts are doing something subtler that happens to come out right. Every change risks knocking over one of the invisible struts.

The fix is refactoring, and refactoring needs tests

The way out is the same way you'd handle any code you don't fully trust: get tests around it, then change it. Pin down what it currently does (characterization tests, if the current behavior is unverified) and then refactor toward something where the behavior is intentional rather than accidental. Replace the regex that happens to work with one that says what it means. Replace the timing assumption with an explicit synchronization or an idempotency check. Replace the implicit dependency on a database index with an ORDER BY clause. Replace the JPY-breaking currency math with a money type that respects precision.

The tests are what make this safe. Without them, you can't tell whether your refactor preserved the accidental property that some downstream code is quietly depending on. With them, you can change the code with confidence. Even better, when you discover that something downstream was depending on the accident, the failing test tells you exactly where, instead of a customer telling you on Twitter.

The compound result, over time, is a codebase whose behavior is intentional. Things work because someone made them work, in a specific way, on purpose. Things continue to work because the tests catch you when you slip. That's the difference between a system you can confidently change and a system you tiptoe around.

When "it works" becomes the argument against fixing it

There's a flip side to all of this, which is when "it works" stops being an observation and starts being a defense. You raise a concern about a piece of code (the timing is fragile, the regex is doing something a regex shouldn't be relied on for, the currency math is going to break the day someone adds a non-USD customer) and the response comes back: it works. We have other priorities. If it ain't broke, don't fix it.

The phrase sounds like a cost-benefit analysis, but it isn't one. A real cost-benefit analysis would name what you're getting and what you're giving up. "It works" skips straight to the conclusion by treating "works" as a binary — either it does or it doesn't, and since it does, we're done. Everything in the preceding sections is the case for why that binary is the wrong frame.

What you're actually accepting when you deploy "it works" as a defense is a list of things, and they're worth saying out loud. You're accepting that the accidental property will continue to hold under conditions you can't enumerate, because you haven't enumerated them. You're accepting that when it does break, it will break at a time you didn't choose. Usually the worst time, because the conditions that break it correlate with unusual load, unusual data, unusual everything. You're accepting that the fix will be more expensive later, because more code will have come to depend on the current behavior in the meantime. And you're accepting that the people who understand the system well enough to fix it cheaply today may not be on the team by the time the bill comes due.

None of that is automatically wrong as a tradeoff. Sometimes you genuinely don't have the cycles, and the expected cost of the eventual incident really is lower than the cost of fixing it now. That's a real call to make. But it's a call you can only make honestly if you've named the thing you're trading away. "We know this is fragile in these specific ways, and we're choosing to leave it because X" is an engineering decision. "It works" is the version of that sentence where everything after "works" has been quietly deleted — and what's left sounds like a reason but is actually a refusal to look.

So the next time you hear "it works"

Ask one more question. Does it work, or does it werk? Is the behavior a property you can rely on, or is it an observation you got lucky with? Are there tests that say it will keep working, or just a person who ran it once and didn't see it break?

And ask hardest when "it works" is being used to end the conversation rather than describe a state. The defensive "it works" is the one most likely to be covering something its speaker hasn't actually looked at.

Working software is good. It is genuinely better than software that doesn't work. But "it works" is the floor, not the ceiling, and a system built entirely out of code that satisfies the floor is a system that will surprise you. Push for the rest — for code whose correctness is intentional, whose behavior is pinned down, whose dependencies are explicit. That's when "it works" becomes the same word in writing as it is when you say it out loud.

Top comments (3)

Gilder Miller • May 15

This is a sharp framing. The distinction between "works" and "werks" captures something most engineers have felt but struggled to name 👋.
Building on incidental behavior is the real danger. Teams freeze around modules not because the code is complex, but because nobody knows which properties are intentional and which are accidental.

Good piece. Would be happy to chat more about engineering topics if you are open to it. Always useful to have more people in the network who think clearly about technical debt.

Ian Johnson • May 15

Sure! I'm definitely open to connecting.

Gilder Miller • May 15

Whatsapp: +1 (845) 513-9795
what kind of platform do you use?
Hope to see you soon.