CronScheduler: a reliable Java scheduler for external interactions

A walk through inconsistencies between different notions of time and the related pitfalls.

Roman Leventov
7 min readJan 29, 2020

ScheduledThreadPoolExecutor is prone to unbounded clock drift

Recently, I’ve realized that ScheduledThreadPoolExecutor is prone to unbounded clock drift and therefore should not be used to schedule tasks at specific timestamps or specific rates in terms of UTC, Unix time, or system time (for example, once every hour) for longer than a few days, which is usually the case in the backend (unless you mandatory restart your server applications daily) as well as desktop software.

It is this question on StackOverflow which made me thinking about this problem, where somebody observed ~15 minutes of drift per day when using a ScheduledThreadPoolExecutor.

java.util.Timer freezes periodic tasks or piles them up when system time is shifted

There is also a dusty Timer class in the JDK which is partially immune to the clock drift problem (for periodic tasks, but not for one-shot tasks scheduled far in the future). This is thanks to the fact that Timer uses System.currentTimeMillis (system time) as the source of time, whereas ScheduledThreadPoolExecutor uses System.nanoTime (CPU time).

Although system time may also drift against Coordinated Universal Time (UTC), perhaps even faster than CPU time in some cases, we assume that the machine regularly synchronizes with an NTP server to correct the drift while it is small.

Timer, however, also has drawbacks.

First, Timer behaves unexpectedly when system time is shifted. When the system time is shifted backward, periodic tasks stop running for the period of the shift. When the system time is shifted forward, Timer attempts to catch up by firing many instances of the periodic tasks in quick succession, which may be undesirable.

ScheduledThreadPoolExecutor may also catch up periodic tasks, although this is rarely manifested. Periodic tasks might pile up in ScheduledThreadPoolExecutor only if one of the tasks blocks the executor’s thread for a long time or due to a long GC pause. Events of both these types usually last only for a few seconds, maybe for up to a minute, while users may shift system time manually by hours or even days and thus cause significant bursts of task runs scheduled on Timer.

When using Timer (but not ScheduledThreadPoolExecutor), it’s possible to circumvent this problem by manually checking the tardiness of TimerTask runs by comparing scheduledExecutionTime with the current time as proposed here. But, obviously, it’s not good to force users to write such nasty workarounds themselves.

In general, Timer has a somewhat outdated API. Tasks cannot be lambdas because they have to extend TimerTask which is an abstract class, not a functional interface. Timer’s schedule methods don’t return Future objects which could be used to obtain the result of the execution of one-shot tasks or to cancel tasks.

Neither ScheduledThreadPoolExecutor nor Timer take machine suspension into account

Another little-appreciated problem with ScheduledThreadPoolExecutor (as well as Timer) when scheduling in terms of UTC or wall clock time rather than in terms of computer’s abstractions of time (system time or CPU time) is that ScheduledThreadPoolExecutor doesn’t account for the time spent by the PC, laptop, or tablet in suspend mode, like sleep or hibernation.

For example, if a task is submitted for execution with a one hour delay, and then after one minute the user closes the laptop lid for 1 hour, when user continues to work with the laptop the task won’t start for another 59 minutes, although in some cases executing the task immediately after the laptop’s lid is open would be a more reasonable behavior: think about notifications or checking updates from some web services.

Solution: CronScheduler

If you haven’t dwelled on the topic of time before, your head might spin between Coordinated Universal Time, wall clock time (aka ZonedDateTime in Java), Unix time, system time, and CPU time at this point. The good news is that CronScheduler is now to handle some of this complexity for you.

CronScheduler is named after the cron utility because it strives to match the scheduling precision and reliability of cron as closely as it is possible within a Java process.

CronScheduler is similar a single-threaded ScheduledThreadPoolExecutor which, like Timer, uses system time (via System.currentTimeMillis) as the time source instead of CPU time. If there is a more reliable time provider available, it could be configured for the CronScheduler instance as well.

To iron out the clock drift problem, as well as to combat the machine suspension problem described above, CronScheduler defines a so-called sync period that is a mandatory wake-up period for the CronScheduler’s thread. When CronScheduler wakes up to run some task, or because it has slept for a whole sync period, it checks the system time and adjusts the remaining waits for the scheduled tasks if needed. This way, CronScheduler effectively bounds the tardiness of periodic tasks after machine suspension episodes by its sync period.

Sync period must be chosen for each instance of CronScheduler individually depending on how much clock drift is tolerable, whether or not machine suspension events and significant system time setbacks are expected (usually on consumer computers and devices, but in the server environment), and what is the maximum tolerable task delay when these things happen.

If CronScheduler detects that at some point system time has been shifted backward, it also examines all scheduled periodic tasks to see if they now need to go off sooner than was expected before. It prevents periodic tasks from freezing in the face of system time setbacks (at least, not for longer than the CronScheduler’s sync period).

Schedule periodic tasks at round wall clock times

CronScheduler has equivalents for all methods of ScheduledExecutorService except scheduleWithFixedDelay.

On the other hand, CronScheduler provides additional scheduleAtRoundTimesInDay methods to schedule a periodic task at some round times within a day (for example, at the beginning of each 3-hour period: at 00:00, 03:00, 06:00, etc.) in the given time zone, handling the complexity of calculating the initial trigger time and taking into account daylight saving time changes.

Sticking to round wall clock times in the specified time zone, no matter what, in the presence of daylight saving changes (or permanent zone offset changes) means that the perfect periodicity of the task runs in terms of physical time or system time might be disturbed at the moments when the clocks are changed. Make sure to consider this tradeoff before using scheduleAtRoundTimesInDay methods.

Skip to latest periodic task runs

CronScheduler also provides equivalents of scheduleAtFixedRate and scheduleAtRoundTimesInDay methods that consider system time may be shifted forward and skip all but the latest run times, solving the “task run bursts” problem in the face of forward time shifts which Timer is prone to.

Recommendations: which scheduler to use when?

  1. Use ScheduledThreadPoolExecutor for anything concerning the internal business of the Java process only. However, watch carefully that the intra-process interaction is not connected semantically to some external interaction, and that it doesn’t affect the dynamics of the higher-level system (the machine or the cluster) in some subtle way. For example, if there is a query and the user specifies a 5-second timeout, scheduling coordinated interruption within the process is not actually a purely internal concern. (That is not to say that ScheduledThreadPoolExecutor shouldn’t be used in this case — it is still covered by the next item in this list.)
  2. Use ScheduledThreadPoolExecutor for one-shot timeout, expiration, eviction, delayed retry, cleanup, kill, notification, or any other similar action, within the machine or remote, as long as the delay is relatively short (say, less than a day) and the machine is not expected to go into suspend mode, i. e. on servers. Consider CronScheduler if either one of these conditions is not met, that is if the delay is counted in weeks (examples: auth token or cookie expiration), or the user’s computer or device may go to sleep.
  3. Use ScheduledThreadPoolExecutor for periodic cleanup, flush, refresh, configuration reload, dump, heartbeat, health check, status check, or any other similar action, within the machine or remote, as long as time is not semantically involved in the action and the action is idempotent.
  4. If the periodic action within the machine or in the distributed system has some connection to the concept of time, consider CronScheduler. One example is a Java process sending metrics to some external monitoring system once every minute. If using ScheduledThreadPoolExecutor, the process and the monitoring system must not simply assume that each sending corresponds to the next minute: clock drift will eventually make the metrics dashboard misleading for correlating events on different nodes of the distributed system. Alternatively, you can attach the current system time truncated to the minute to each sending, but then absent minutes or double sending will be fairly common. Using CronScheduler would be simpler, more reliable, and produce smoother metrics. Other examples of periodic actions that may subtly entangle the time component are backups, log rotation, replication, inter-node synchronization, and checkpoints.
  5. For generating passage-of-time events, scheduling data processing jobs, or periodic data retention rule enforcement (business rules, legal policies), within a machine or in a distributed system, in the order of preference (if we consider only scheduling precision and reliability), use:
    — Scheduling facilities available from your cloud provider;
    systemd or cron utility;
    — Scheduling facility available from your cluster management or execution framework, like Kubernetes or Mesos;
    — Scheduling by a program written in a language without GC or with a very low-pause GC, such as C++, Rust, or Go;
    CronScheduler, preferably running in a JVM with a low-pause GC, such as Shenandoah GC or ZGC.
    These events are always defined in terms of either UTC, Unix time, or system time, so you should never use ScheduledThreadPoolExecutor for these purposes.
  6. For any interactions with humans, such as alarms, notifications, timers, or task management, and for interactions between user’s computer or device and remote services, such as checking for new e-mails or messages, widget updates, or software updates:
    — On Android, use Android-specific APIs. Check out this post for more details.
    CronScheduler, if you are writing a vanilla Java app.
  7. Never use Timer: all its valid use-cases are superseded by either ScheduledThreadPoolExecutor or CronScheduler.

In real code

Here are a couple of concrete examples from production codebase of Apache Druid where ScheduledThreadPoolExecutor should have been replaced with CronScheduler (both are covered by the 4th recommendation):

Where do I get CronScheduler and how to get started?

See Readme on Github.

--

--