Alert Fatigue: The Silent Productivity Killer

#sre #devops #oncall #alerts

Alert fatigue isn't a wellness issue. It's a production risk. And it's cheaper to fix than most people think.

The math

An engineer getting 30 alerts a night sleeps badly. Sleeps badly → cognitive load next day drops 20%. Drops 20% → higher chance of introducing a bug. That bug becomes the next incident. The next incident is another 30 alerts.

You are in a loop.

How it actually happens

Nobody sets out to create alert fatigue. It happens one alert at a time.

'Let's add an alert for this' (reasonable)
'The threshold is too sensitive, let's keep it but note it' (rationalizing)
'It fires sometimes but we know what it means' (tolerating)
'We should probably look at that' (normalizing)

Two years later you have 400 alerts and nobody remembers why half of them exist.

The fix

Delete every alert that hasn't caused action in 30 days. If nobody acted on it, it's not an alert.
Raise the threshold on noisy alerts until they only fire for real problems. Your boss is not going to fire you for missing noise.
Group related alerts. One incident should page once, not 40 times.
Set a max alerts/day per engineer rule. When you hit it, something gets cut.

Teams that aggressively prune alerts sleep better, ship more, and have fewer real incidents. It's not a trade-off. It's just good hygiene.

Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com