CronScheduler: a reliable Java scheduler for external interactions

A walk through inconsistencies between different notions of time and the related pitfalls.

ScheduledThreadPoolExecutor is prone to unbounded clock drift

It is this question on StackOverflow which made me thinking about this problem, where somebody observed ~15 minutes of drift per day when using a ScheduledThreadPoolExecutor.

java.util.Timer freezes periodic tasks or piles them up when system time is shifted

Although system time may also drift against Coordinated Universal Time (UTC), perhaps even faster than CPU time in some cases, we assume that the machine regularly synchronizes with an NTP server to correct the drift while it is small.

Timer, however, also has drawbacks.

First, Timer behaves unexpectedly when system time is shifted. When the system time is shifted backward, periodic tasks stop running for the period of the shift. When the system time is shifted forward, Timer attempts to catch up by firing many instances of the periodic tasks in quick succession, which may be undesirable.

ScheduledThreadPoolExecutor may also catch up periodic tasks, although this is rarely manifested. Periodic tasks might pile up in ScheduledThreadPoolExecutor only if one of the tasks blocks the executor’s thread for a long time or due to a long GC pause. Events of both these types usually last only for a few seconds, maybe for up to a minute, while users may shift system time manually by hours or even days and thus cause significant bursts of task runs scheduled on Timer.

When using Timer (but not ScheduledThreadPoolExecutor), it’s possible to circumvent this problem by manually checking the tardiness of TimerTask runs by comparing scheduledExecutionTime with the current time as proposed here. But, obviously, it’s not good to force users to write such nasty workarounds themselves.

In general, Timer has a somewhat outdated API. Tasks cannot be lambdas because they have to extend TimerTask which is an abstract class, not a functional interface. Timer’s schedule methods don’t return Future objects which could be used to obtain the result of the execution of one-shot tasks or to cancel tasks.

Neither ScheduledThreadPoolExecutor nor Timer take machine suspension into account

For example, if a task is submitted for execution with a one hour delay, and then after one minute the user closes the laptop lid for 1 hour, when user continues to work with the laptop the task won’t start for another 59 minutes, although in some cases executing the task immediately after the laptop’s lid is open would be a more reasonable behavior: think about notifications or checking updates from some web services.

Solution: CronScheduler

CronScheduler is named after the cron utility because it strives to match the scheduling precision and reliability of cron as closely as it is possible within a Java process.

CronScheduler is similar a single-threaded ScheduledThreadPoolExecutor which, like Timer, uses system time (via System.currentTimeMillis) as the time source instead of CPU time. If there is a more reliable time provider available, it could be configured for the CronScheduler instance as well.

To iron out the clock drift problem, as well as to combat the machine suspension problem described above, CronScheduler defines a so-called sync period that is a mandatory wake-up period for the CronScheduler’s thread. When CronScheduler wakes up to run some task, or because it has slept for a whole sync period, it checks the system time and adjusts the remaining waits for the scheduled tasks if needed. This way, CronScheduler effectively bounds the tardiness of periodic tasks after machine suspension episodes by its sync period.

Sync period must be chosen for each instance of CronScheduler individually depending on how much clock drift is tolerable, whether or not machine suspension events and significant system time setbacks are expected (usually on consumer computers and devices, but in the server environment), and what is the maximum tolerable task delay when these things happen.

If CronScheduler detects that at some point system time has been shifted backward, it also examines all scheduled periodic tasks to see if they now need to go off sooner than was expected before. It prevents periodic tasks from freezing in the face of system time setbacks (at least, not for longer than the CronScheduler’s sync period).

Schedule periodic tasks at round wall clock times

On the other hand, CronScheduler provides additional scheduleAtRoundTimesInDay methods to schedule a periodic task at some round times within a day (for example, at the beginning of each 3-hour period: at 00:00, 03:00, 06:00, etc.) in the given time zone, handling the complexity of calculating the initial trigger time and taking into account daylight saving time changes.

Sticking to round wall clock times in the specified time zone, no matter what, in the presence of daylight saving changes (or permanent zone offset changes) means that the perfect periodicity of the task runs in terms of physical time or system time might be disturbed at the moments when the clocks are changed. Make sure to consider this tradeoff before using scheduleAtRoundTimesInDay methods.

Skip to latest periodic task runs

Recommendations: which scheduler to use when?

  1. Use ScheduledThreadPoolExecutor for one-shot timeout, expiration, eviction, delayed retry, cleanup, kill, notification, or any other similar action, within the machine or remote, as long as the delay is relatively short (say, less than a day) and the machine is not expected to go into suspend mode, i. e. on servers. Consider CronScheduler if either one of these conditions is not met, that is if the delay is counted in weeks (examples: auth token or cookie expiration), or the user’s computer or device may go to sleep.
  2. Use ScheduledThreadPoolExecutor for periodic cleanup, flush, refresh, configuration reload, dump, heartbeat, health check, status check, or any other similar action, within the machine or remote, as long as time is not semantically involved in the action and the action is idempotent.
  3. If the periodic action within the machine or in the distributed system has some connection to the concept of time, consider CronScheduler. One example is a Java process sending metrics to some external monitoring system once every minute. If using ScheduledThreadPoolExecutor, the process and the monitoring system must not simply assume that each sending corresponds to the next minute: clock drift will eventually make the metrics dashboard misleading for correlating events on different nodes of the distributed system. Alternatively, you can attach the current system time truncated to the minute to each sending, but then absent minutes or double sending will be fairly common. Using CronScheduler would be simpler, more reliable, and produce smoother metrics. Other examples of periodic actions that may subtly entangle the time component are backups, log rotation, replication, inter-node synchronization, and checkpoints.
  4. For generating passage-of-time events, scheduling data processing jobs, or periodic data retention rule enforcement (business rules, legal policies), within a machine or in a distributed system, in the order of preference (if we consider only scheduling precision and reliability), use:
    — Scheduling facilities available from your cloud provider;
    systemd or cron utility;
    — Scheduling facility available from your cluster management or execution framework, like Kubernetes or Mesos;
    — Scheduling by a program written in a language without GC or with a very low-pause GC, such as C++, Rust, or Go;
    CronScheduler, preferably running in a JVM with a low-pause GC, such as Shenandoah GC or ZGC.
    These events are always defined in terms of either UTC, Unix time, or system time, so you should never use ScheduledThreadPoolExecutor for these purposes.
  5. For any interactions with humans, such as alarms, notifications, timers, or task management, and for interactions between user’s computer or device and remote services, such as checking for new e-mails or messages, widget updates, or software updates:
    — On Android, use Android-specific APIs. Check out this post for more details.
    CronScheduler, if you are writing a vanilla Java app.
  6. Never use Timer: all its valid use-cases are superseded by either ScheduledThreadPoolExecutor or CronScheduler.

In real code

Where do I get CronScheduler and how to get started?

Software engineer and designer, author. Working at Northvolt.