Leap seconds (considered harmful)

Posted on January 15, 2016

If you have a good enough idea of what leap seconds, UTC, and TAI are, feel free to skip down to the “a damaging compromise” subheading below.

what’s a timescale?

Measuring and synchronizing and transferring time needs a concept of a timescale – a standard that both defines the rate of advance of time and also one (or more) reference points. For example, I could define a timescale t (with arbitrary units) that has t=0 at the point I published this post, and that increments by 1 unit every time a pendulum I am holding completes a full cycle. Note that multiple timescales can coexist, just as multiple units of temperature can coexist – if you know the zero points and the rates of advance, you can translate from one to another rather easily. Usually, zero points can be chosen arbitrarily (such as the famous Unix Epoch), because as long as two timescales tick at the same rate, converting between them if they have different epochs / zero points is trivial for computers and doesn’t add confusion or ambiguity (unless you want to deal with times before the epoch, which might involve negative numbers).

atoms for time:

The rate of advance of timescales is a trickier thing. If you’re going around defining timescales, you want that to be dependent on an easily and accurately reproducible physical phenomenon, so you can set up your own standard and be able to keep/transfer time on your own. Nowadays we have atomic frequency standards, which work by using the properties of a sample of atoms (usually cesium or rubidium) to generate an extremely stable frequency that (with careful equipment design) doesn’t depend on any external factors. The atoms have a bunch of energy levels and can absorb/release energy (in form of electromagnetic radiation) to change energy levels – but here we’re only interested in one specific transition: for cesium it’s a specific transition that releases/absorbs radiation of a frequency of 9,192,631,770 Hz. This number is only a function of the cesium atom – as long as the experiment setup and equipment don’t introduce inaccuracies, anyone can induce that transition (by exciting the cesium atoms with the right frequency of microwaves) and everyone will measure the same frequency.

What the atomic frequency standard does is conceptually simple – it adjusts the frequency of the oscillator that is coupled into the sample cell and (with methods that aren’t really relevant here) measures how many of the atoms inside the sample cell undergo the specific energy level transition we want to induce. If our electronic oscillator is “out of tune” with our sample cell atoms, very few will undergo transition – however, if we’re making microwaves at the same frequency the atoms accept, many will undergo transition. The electronics implement a feedback loop that works to maximise the number of atoms that undergo transition. If everything works right, this physics experiment has done something quite remarkable – if we tap off some of the variable frequency oscillator’s output, we have a frequency generator that generates a frequency not determined by anything specific to our experiment or any of the electronic components. As long as our microwave oscillator is in tune with the atoms in the sample cell, the frequency it generates depends only on the fundamental properties of cesium atoms. Cesium atoms don’t fall apart or go out of tune or undergo frequency drift, and if you purify cesium and I purify cesium we’ll get atoms that are identical. The convenience of cesium atoms for doing this sort of thing has led to the second being defined in terms of cesium atoms – you build that cesium frequency standard and count off 9,192,631,770 cycles on that oscillator that’s in tune with the cesium atoms, and how long that takes is defined to be one SI second. The cesium atoms will always tick at 9,192,631,770 pulses per second, but you can use electronics that count the pulses generated by the cesium standard and only output a pulse when 9,192,631,770 cesium pulses have gone by – that lets you have a pulse-per-second output that is derived from the cesium standard. You can use this frequency division method to generate any frequency of pulse you want from the raw, 9.192 GHz cesium standard output.

Earth time vs atom time:

Cesium frequency standards are wonderful and good and keep excellent time and everyone uses them and the GPS system depends on them and all timekeeping that involves computers and networks depends on them – either by computers attached to atomic clocks, or by computers attached to GPS receivers that get time that depends on atomic clocks in the GPS systems. There’s one issue here: before we had cesium clocks, we already had defined the second. This isn’t inherently trouble – we could just have defined the “atomic” second so that whatever old, non-atomic, definition of the second would match up, and just go on with life, except with better time/frequency-keeping. However, there’s a massive problem – the previous definition of the second was based on the rotation angle of the Earth (that thing that generates days and nights rather periodically) and the Earth’s rotation is slowing down. Oh no :(. They’re fundamentally different timescales, and it has nothing to do with their ticking rates being different – it’s because atomic frequency standards have a tick rate that doesn’t change as they get older and the Earth has a mean solar day that gets a tiny bit longer as it gets older.

A damaging compromise:

There’s an awkward problem here, and the standards bodies behind timekeeping have dealt with it in an unclean and damaging way. The current state of affairs is this: just about every time you see or work with is on a timescale called UTC (or converted to local time with a constant hour:minute offset from UTC). The UTC timescale is a botched up hybrid of two timescales: UT1 (Universal Time 1), derived entirely from (smoothed-out) Earth rotation, and TAI (International Atomic Time), derived entirely from atomic clocks. UTC (Universal Coordinated Time) normally ticks off each second in sync with TAI, which makes it seem deceptively easy to handle. However, UTC is defined such that when the offset between UTC and UT1 gets too large, a standards bureau decides that UTC needs to take a time-out (known as a “leap second”) and let UT1 (the Earth’s rotation) catch up. During the leap second, UTC has a mildly nonsensical value – rather than going from 23:59:59 to 00:00:00 as usual, UTC on the leap second-affected day goes from 23:59:59 to 23:59:60 to 00:00:00. While this fulfills the goals of UTC – use constant second lengths that match up with the cesium-derived SI standard second but regularly insert leap seconds to keep UTC in sync with the Earth’s rotation – leap seconds cause significant problems for computers, networks, navigational systems, financial systems, air traffic control systems, and allegedly, even US nuclear weapons systems. Note that leap seconds aren’t even deterministic – they depend on measurements of the Earth’s rotation. Software not only chokes on the leap second, but can’t even know more than a few months in advance if there’ll even be a leap second.

The problem with leap seconds not due to shoddy coding. There are irreconcilable differences between the semantics of time that most software depend on and the semantics of UTC around leap seconds. Aside from conveniently scheduling maintenance/downtime periods to coincide with leap second days, there are two fundamental schools of thought about how to deal with leap seconds – the first is to make all software (all application software, libraries, operating systems, NTP software) aware of leap seconds and correctly operate around them. This is a nontrivial task – every calculation of times and dates and time intervals is affected, as is storage and transmission of time/date values. This is a deeply fragile and devastatingly difficult solution – if you forget to take into account leap seconds once, everything can get messed up in your software. This is not an approach to dealing with complexity that has a good record in software engineering (ask a C programmer about memory safety).

Bad standards (UTC) and a proven alternative (TAI + the timezone database)

Using UTC with leap seconds for computers does not seem to be a good idea. The traditional response might be to blame programmers for failing to be sufficiently careful and detail-oriented but it’s not a useful or correct response. A standard that gives weird and non-deterministic semantics to the thing we expect to have the simplest and most deterministic semantics (outside of relativistic effects) – time itself – is a bad standard. djb’s view on cryptographic standards is applicable: if a standard leaves easily avoidable ways for the implementers of it to mess up, it’s a bad standard. Instead of having to redesign every bit of every system that manipulates/stores/transmits/computes with time to account for a nondeterministic leap second (that almost certainly doesn’t even have a good representation in whatever time formats are in use), we would like to get rid of UTC and use something like TAI (or anything that has a constant offset from TAI) to get rid of all leap second issues in one fell swoop. We can’t completely relegate UTC and leap seconds to the dustbin of history, because there are applications that need to use time that lines up nicely with days and nights: civil time, for example, will remain defined by UTC. This isn’t a problem though – as we already have a time-tested mechanism to convert time from a universal standard to a bunch of separate time standards that accounts for non-deterministic updates in time conversion rules that leap seconds would require: it’s called the timezone database. Rather than injecting leap seconds directly into an otherwise unsullied atomic timescale, using TAI for time transfer (NTP) and internal software use and only adding leap seconds when needed with the timezone mechanism is a much cleaner separation of concerns. This is no untested proposal: this is what the GPS system (which depends on precise timing) uses. GPS time is TAI (with an offset of 19 seconds – but the offset never changes, so it’s irrelevant) and GPS navigational messages contains information to relate GPS time to UTC (how many leap seconds have been added to UTC so far).