CronScheduler: a reliable Java scheduler for external interactions
A walk through inconsistencies between different notions of time and the related pitfalls.
ScheduledThreadPoolExecutor is prone to unbounded clock drift
Recently, I’ve realized that
ScheduledThreadPoolExecutor is prone to unbounded clock drift and therefore should not be used to schedule tasks at specific timestamps or specific rates in terms of UTC, Unix time, or system time (for example, once every hour) for longer than a few days, which is usually the case in the backend (unless you mandatory restart your server applications daily) as well as desktop software.
It is this question on StackOverflow which made me thinking about this problem, where somebody observed ~15 minutes of drift per day when using a
java.util.Timer freezes periodic tasks or piles them up when system time is shifted
There is also a dusty
Timer class in the JDK which is partially immune to the clock drift problem (for periodic tasks, but not for one-shot tasks scheduled far in the future). This is thanks to the fact that
System.currentTimeMillis (system time) as the source of time, whereas
System.nanoTime (CPU time).
Although system time may also drift against Coordinated Universal Time (UTC), perhaps even faster than CPU time in some cases, we assume that the machine regularly synchronizes with an NTP server to correct the drift while it is small.
Timer, however, also has drawbacks.
First, Timer behaves unexpectedly when system time is shifted. When the system time is shifted backward, periodic tasks stop running for the period of the shift. When the system time is shifted forward,
Timer attempts to catch up by firing many instances of the periodic tasks in quick succession, which may be undesirable.
ScheduledThreadPoolExecutor may also catch up periodic tasks, although this is rarely manifested. Periodic tasks might pile up in
ScheduledThreadPoolExecutor only if one of the tasks blocks the executor’s thread for a long time or due to a long GC pause. Events of both these types usually last only for a few seconds, maybe for up to a minute, while users may shift system time manually by hours or even days and thus cause significant bursts of task runs scheduled on
Timer (but not
ScheduledThreadPoolExecutor), it’s possible to circumvent this problem by manually checking the tardiness of
TimerTask runs by comparing
scheduledExecutionTime with the current time as proposed here. But, obviously, it’s not good to force users to write such nasty workarounds themselves.
Timer has a somewhat outdated API. Tasks cannot be lambdas because they have to extend
TimerTask which is an abstract class, not a functional interface. Timer’s
schedule methods don’t return
Future objects which could be used to obtain the result of the execution of one-shot tasks or to cancel tasks.
Timer take machine suspension into account
Another little-appreciated problem with
ScheduledThreadPoolExecutor (as well as
Timer) when scheduling in terms of UTC or wall clock time rather than in terms of computer’s abstractions of time (system time or CPU time) is that
ScheduledThreadPoolExecutor doesn’t account for the time spent by the PC, laptop, or tablet in suspend mode, like sleep or hibernation.
For example, if a task is submitted for execution with a one hour delay, and then after one minute the user closes the laptop lid for 1 hour, when user continues to work with the laptop the task won’t start for another 59 minutes, although in some cases executing the task immediately after the laptop’s lid is open would be a more reasonable behavior: think about notifications or checking updates from some web services.
If you haven’t dwelled on the topic of time before, your head might spin between Coordinated Universal Time, wall clock time (aka
ZonedDateTime in Java), Unix time, system time, and CPU time at this point. The good news is that
CronScheduler is now to handle some of this complexity for you.
CronScheduler is similar a single-threaded
ScheduledThreadPoolExecutor which, like
Timer, uses system time (via
System.currentTimeMillis) as the time source instead of CPU time. If there is a more reliable time provider available, it could be configured for the
CronScheduler instance as well.
To iron out the clock drift problem, as well as to combat the machine suspension problem described above, CronScheduler defines a so-called sync period that is a mandatory wake-up period for the CronScheduler’s thread. When CronScheduler wakes up to run some task, or because it has slept for a whole sync period, it checks the system time and adjusts the remaining waits for the scheduled tasks if needed. This way, CronScheduler effectively bounds the tardiness of periodic tasks after machine suspension episodes by its sync period.
Sync period must be chosen for each instance of
CronScheduler individually depending on how much clock drift is tolerable, whether or not machine suspension events and significant system time setbacks are expected (usually on consumer computers and devices, but in the server environment), and what is the maximum tolerable task delay when these things happen.
If CronScheduler detects that at some point system time has been shifted backward, it also examines all scheduled periodic tasks to see if they now need to go off sooner than was expected before. It prevents periodic tasks from freezing in the face of system time setbacks (at least, not for longer than the CronScheduler’s sync period).
Schedule periodic tasks at round wall clock times
CronScheduler has equivalents for all methods of
On the other hand, CronScheduler provides additional
scheduleAtRoundTimesInDay methods to schedule a periodic task at some round times within a day (for example, at the beginning of each 3-hour period: at 00:00, 03:00, 06:00, etc.) in the given time zone, handling the complexity of calculating the initial trigger time and taking into account daylight saving time changes.
Sticking to round wall clock times in the specified time zone, no matter what, in the presence of daylight saving changes (or permanent zone offset changes) means that the perfect periodicity of the task runs in terms of physical time or system time might be disturbed at the moments when the clocks are changed. Make sure to consider this tradeoff before using
Skip to latest periodic task runs
CronScheduler also provides equivalents of
scheduleAtRoundTimesInDay methods that consider system time may be shifted forward and skip all but the latest run times, solving the “task run bursts” problem in the face of forward time shifts which
Timer is prone to.
Recommendations: which scheduler to use when?
ScheduledThreadPoolExecutorfor anything concerning the internal business of the Java process only. However, watch carefully that the intra-process interaction is not connected semantically to some external interaction, and that it doesn’t affect the dynamics of the higher-level system (the machine or the cluster) in some subtle way. For example, if there is a query and the user specifies a 5-second timeout, scheduling coordinated interruption within the process is not actually a purely internal concern. (That is not to say that
ScheduledThreadPoolExecutorshouldn’t be used in this case — it is still covered by the next item in this list.)
ScheduledThreadPoolExecutorfor one-shot timeout, expiration, eviction, delayed retry, cleanup, kill, notification, or any other similar action, within the machine or remote, as long as the delay is relatively short (say, less than a day) and the machine is not expected to go into suspend mode, i. e. on servers. Consider
CronSchedulerif either one of these conditions is not met, that is if the delay is counted in weeks (examples: auth token or cookie expiration), or the user’s computer or device may go to sleep.
ScheduledThreadPoolExecutorfor periodic cleanup, flush, refresh, configuration reload, dump, heartbeat, health check, status check, or any other similar action, within the machine or remote, as long as time is not semantically involved in the action and the action is idempotent.
- If the periodic action within the machine or in the distributed system has some connection to the concept of time, consider
CronScheduler. One example is a Java process sending metrics to some external monitoring system once every minute. If using
ScheduledThreadPoolExecutor, the process and the monitoring system must not simply assume that each sending corresponds to the next minute: clock drift will eventually make the metrics dashboard misleading for correlating events on different nodes of the distributed system. Alternatively, you can attach the current system time truncated to the minute to each sending, but then absent minutes or double sending will be fairly common. Using
CronSchedulerwould be simpler, more reliable, and produce smoother metrics. Other examples of periodic actions that may subtly entangle the time component are backups, log rotation, replication, inter-node synchronization, and checkpoints.
- For generating passage-of-time events, scheduling data processing jobs, or periodic data retention rule enforcement (business rules, legal policies), within a machine or in a distributed system, in the order of preference (if we consider only scheduling precision and reliability), use:
— Scheduling facilities available from your cloud provider;
— Scheduling facility available from your cluster management or execution framework, like Kubernetes or Mesos;
— Scheduling by a program written in a language without GC or with a very low-pause GC, such as C++, Rust, or Go;
CronScheduler, preferably running in a JVM with a low-pause GC, such as Shenandoah GC or ZGC.
These events are always defined in terms of either UTC, Unix time, or system time, so you should never use
ScheduledThreadPoolExecutorfor these purposes.
- For any interactions with humans, such as alarms, notifications, timers, or task management, and for interactions between user’s computer or device and remote services, such as checking for new e-mails or messages, widget updates, or software updates:
— On Android, use Android-specific APIs. Check out this post for more details.
CronScheduler, if you are writing a vanilla Java app.
- Never use
Timer: all its valid use-cases are superseded by either
In real code
Here are a couple of concrete examples from production codebase of Apache Druid where
ScheduledThreadPoolExecutor should have been replaced with
CronScheduler (both are covered by the 4th recommendation):
Where do I get CronScheduler and how to get started?
See Readme on Github.