ProductPricingAboutContactBlogDACertiva ↗Log inStart free →
Guide

Heartbeat monitoring: how to catch silent cron job failures

Cron jobs, backups and background workers fail silently — nothing requests them, so nothing notices when they stop. Heartbeat monitoring flips the logic around.

At 9:14 on a Monday morning, an engineer at a small SaaS company opened a ticket to restore a customer's database. Then she found out the nightly backup had been failing for nineteen days. Nobody had touched the server. Nothing had broken loudly. The job had simply stopped running — and in monitoring, silence looks exactly like success.

The job that fails without a sound

That story is unremarkable precisely because it is so common. Almost every team has a scheduled task quietly doing important work in the background: a database dump at 2am, a billing run on the first of the month, a queue worker chewing through jobs, an export that feeds a partner's system. They run unattended for months — until one day they don't.

When a background job stops, it rarely announces it. A container is rescheduled and never comes back. A deploy changes a path. A dependency throws an exception that gets swallowed by an empty `except`. The cron line is there, the server is up, the dashboard is green — and the work is simply not happening. You discover it at the worst possible moment: when you finally need the result.

Why ordinary monitoring can't see it

Most monitoring is built around things that answer when you knock. A website returns a status code, an API responds to a request, a port accepts a connection. You poll them on a schedule and act on what comes back.

A nightly backup answers to no one. It has no public URL, no port, nothing to poll. From the outside it is indistinguishable whether it ran perfectly or never ran at all. That is the blind spot — and it is exactly the kind of work whose failure stays invisible until it is expensive.

How heartbeat monitoring flips the logic

Heartbeat monitoring turns the relationship inside out. Instead of a monitor reaching out to your job, your job reaches out to the monitor. At the end of a successful run it sends a quick HTTP request — a “check-in” or “ping” — to a unique URL. The service learns to expect that ping on a schedule.

If the check-in arrives on time, all is well and you hear nothing. If it is late or never comes, the service alerts you. Engineers call this a “dead man's switch”: the absence of a signal is the signal. You are no longer trusting that a job ran — you are being told the moment it doesn't.

Setting it up so it actually helps

A heartbeat is only as good as how you wire it in. Create one heartbeat per job rather than lumping several together, so an alert points straight at the thing that failed. Set the expected interval to match the schedule, and add a grace period so a run that is a few minutes slow doesn't page anyone at 3am.

The detail that matters most: put the check-in at the very end of the job, after the work has actually succeeded — not at the start. A ping fired before the real work runs will happily report “healthy” on a job that crashes halfway through. Send the ping only on success, and route the alert to the team that owns the job, not to a channel no one reads.

Catching the silence with WatchControl

This is exactly the gap WatchControl is built to close. You create a heartbeat monitor, copy its check-in URL, and add a single curl line to the end of your script or cron entry — no agent to install, no inbound access to open. It works from anywhere your job can make an HTTP request, including servers behind a firewall.

Set the interval and grace period, and WatchControl watches for the silence on your behalf. The moment a check-in is missed, it alerts you by email, webhook or SMS — and because WatchControl is built in Denmark and hosted in the EU, that check-in data never leaves the EU. The backup that fails for nineteen days becomes the backup you hear about on day one.

FAQ

Frequently asked questions

What is the difference between heartbeat monitoring and uptime monitoring?

Uptime monitoring checks things that respond to a request, like a website or API — the monitor reaches out and waits for an answer. Heartbeat monitoring is for jobs that have nothing to poll: the job reaches out to the monitor instead, and you are alerted when an expected check-in goes missing.

What is a dead man's switch in monitoring?

It's a check that triggers on the absence of a signal rather than its presence. Your job pings a URL when it succeeds; if that ping doesn't arrive on schedule, the alert fires. Silence is treated as failure, which is exactly what you want for unattended jobs.

What should I monitor with a heartbeat?

Anything that runs on a schedule with no one watching it: database and file backups, cron jobs, scheduled imports and exports, queue and worker processes, certificate renewals and data-pipeline steps. If its failure would stay invisible until you needed the result, give it a heartbeat.

Where should the check-in go in my script?

At the very end, after the work has succeeded. A ping at the start will report a job as healthy even when it crashes halfway through. Send the ping only on success so a half-finished run never reports green.

Monitor a cron job free

Create a heartbeat monitor and grab its check-in URL — no credit card.