DEV Community

Krasimir Petkov
Krasimir Petkov

Posted on • Edited on

I built a small tool to notice when cron jobs fail silently

I found out one of my background jobs had stopped running only after the data looked wrong the next day.

There was no dramatic crash. No big incident. The job just quietly failed, and I only noticed because something downstream looked stale.

That is the annoying part about cron jobs and scheduled scripts. Most of the time they run in the background, write some logs, and nobody thinks about them until something is missing.

I have a few jobs like this:

  • data updates
  • cleanup scripts
  • small imports
  • external API calls
  • recurring background tasks

None of them are very exciting. But when one of them does not run, or starts and never finishes, it can create a surprisingly annoying problem.

That is the kind of failure I wanted to make more visible.

I also built a small V1 of this idea here:

https://missedrun.com

There is also a self-hosted version for people who prefer to run this kind of monitoring on their own infrastructure:

GitHub logo missedrun / missedrun-selfhosted

Self-hosted cron and scheduled job monitoring for detecting silent failures.

MissedRun Self-hosted

Self-hosted cron and scheduled job monitoring for detecting silent failures.

Hosted version: https://missedrun.com
Self-hosted version: https://github.com/missedrun/missedrun-selfhosted

MissedRun monitors recurring jobs such as cron scripts, backups, imports, ETL pipelines, billing syncs, cleanup tasks, and scheduled reports.

It works by giving each monitor a unique ping URL. Your job calls that URL when it runs. If the job does not check in within the expected interval plus grace period, MissedRun marks it as missing and can send an alert.

Why MissedRun?

Some production failures are not loud.

A job can stop running without throwing an exception:

  • cron did not run
  • a server was down
  • a Docker container stopped
  • credentials expired
  • a backup script never started
  • an ETL job stopped updating data
  • a scheduled report was not generated
  • a background worker silently stopped

MissedRun is built to detect this kind of silent failure.

Features

  • Create monitors for scheduled jobs
  • Generate unique…

This is not a big launch. I am mostly trying to understand if this is a real enough problem for other developers who run cron jobs, ETL jobs, backups, imports, cleanup scripts, or other scheduled tasks.

The problem

Cron jobs are easy to forget about.

They usually do not have a UI. They run somewhere on a server, maybe write logs, and then disappear into the background.

A job can fail because:

  • an API token expired
  • an environment variable is missing
  • a database connection failed
  • the server restarted
  • the script crashed
  • the job started but never finished
  • the cron entry was changed or removed

Logs are useful, but only if you go and check them.

In practice, I usually only check logs after I already suspect something is broken.

For recurring jobs, I often want a much simpler answer:

  • did it start?
  • did it finish?
  • did it fail?
  • did it miss the expected time?

The ping approach

One simple way to monitor this is to make the job report its own status.

The basic pattern is:

  1. send a start ping when the job begins
  2. send a success ping when it finishes
  3. send a failure ping if it crashes
  4. mark it as late or missed if the expected ping does not arrive

It is not a complicated idea, but I have found it very useful in practice.

Instead of checking logs manually, the job tells you whether it is still alive.

For example:

  • if the start ping arrives, the job is running
  • if the success ping arrives, the job finished
  • if the fail ping arrives, the job crashed
  • if nothing arrives when expected, the job is late or missed

That last case is the important one for me.

A lot of failures are not loud. The job does not always send an error. Sometimes it just does not run.

Bash example

Here is a simple shell wrapper.

This uses placeholder URLs. In a real setup, these would be the ping URLs generated by your monitoring tool.


bash
#!/bin/bash

START_URL="https://example.com/ping/YOUR_TOKEN/start"
SUCCESS_URL="https://example.com/ping/YOUR_TOKEN"
FAIL_URL="https://example.com/ping/YOUR_TOKEN/fail"

curl -fsS -X POST --max-time 5 "$START_URL" >/dev/null || true

your-real-command-here

EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
  curl -fsS -X POST --max-time 5 "$SUCCESS_URL" >/dev/null || true
else
  curl -fsS -X POST --max-time 5 "$FAIL_URL" >/dev/null || true
fi

exit $EXIT_CODE
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
krasimir_petkov_c14f3b461 profile image
Krasimir Petkov • Edited

Curious how others think about this:

What’s worse in your setup — a cron job that fails loudly, or one that never runs at all?

For me, the missed case is usually worse because there’s no visible crash. I just notice later that some data is stale.