Alert fatigue isn't a wellness issue. It's a production risk. And it's cheaper to fix than most people think.
The math
An engineer getting 30 alerts a night sleeps badly. Sleeps badly → cognitive load next day drops 20%. Drops 20% → higher chance of introducing a bug. That bug becomes the next incident. The next incident is another 30 alerts.
You are in a loop.
How it actually happens
Nobody sets out to create alert fatigue. It happens one alert at a time.
- 'Let's add an alert for this' (reasonable)
- 'The threshold is too sensitive, let's keep it but note it' (rationalizing)
- 'It fires sometimes but we know what it means' (tolerating)
- 'We should probably look at that' (normalizing)
Two years later you have 400 alerts and nobody remembers why half of them exist.
The fix
- Delete every alert that hasn't caused action in 30 days. If nobody acted on it, it's not an alert.
- Raise the threshold on noisy alerts until they only fire for real problems. Your boss is not going to fire you for missing noise.
- Group related alerts. One incident should page once, not 40 times.
- Set a max alerts/day per engineer rule. When you hit it, something gets cut.
Teams that aggressively prune alerts sleep better, ship more, and have fewer real incidents. It's not a trade-off. It's just good hygiene.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)