This has two advantages. First, if cron jobs are not running at all for some reason, the timeout will notify you.
Second, you can also manually trigger error conditions by pushing to uptime kuma with an error message, exactly the same as if you used cron email notifications.
If the uptime kuma endpoint is pinged with s higher frequency than the timeout you configured it to, then no alerts are fired.
1. Scripts should always return an error (>0) when things did not go as planned and 0 when they did. Always.
2. Scripts should always notify you when they return >0. Either in their own way or via emails sent by Cron.
3. Use chronic ( from Debian moreutils package) to ensure that cron jobs only email output when they ended in error. That way you don't need to worry about things sent to STDOUT spamming you.
4. Create wrapper scripts for jobs that need extra functionality: notification, logging, or sanity checks.E.g., for rolling daily backups, something like
ls -l *.backup | mail -s "backup done" me@foo.dk, someoneelse@bar.dk
even for successful cron jobs. That way you can check file sizes, timestamps, etc.
That way I will notice if something is not working, even if emails are also not working, the server is down etc. It requires of course that you actually read those emails. But at least if people have accepted to check them, they cannot complain. Well, they can of course, but then I can also blame them.
it is a natural progression to move on from cron and adopt an orchestrator tool (many options nowadays) when you need more insight into cron, or when you start finding yourself building custom features around it.
i would do some research into orchestators and see if there are any that meet your requirements. many have feature sets and integrations that’s solve some of the exact problems you’re describing
(as a data engineer my current favorite general purpose orchestrator is dagster. it’s lightweight yet flexible)
edit: as a basic example, in most orchestrators, there is a first class way to define data quality checks, if you have less data than expected, or erroneous data (based upon your expectations) you can define this as an automated check
you can then choose to fail the job, or set a number re-retries before failing , or send a notification to some destination of your choice( they have integrations with slack, and many other alerting tools)
i like dagster because it is geared for hooking into the data itself. you can use it to ‘run a job’ like a some function, but really it shines when you use its ‘data asset features’ that tracks the data itself over time, and provides a nice UI to view and compare data from each run over time. hook in alerting for anomalies and you’re good to go!
they have many more features depending on the tool , and some more or less complicated to set up.