Happy new year, and from a technology perspective, also happy leap second.
[image: timekeeping for the leap second, from FSM Labs]
As always, when there’s a leap second, people get a chance to debug their timekeeping code. I guess if they happen frequently enough the code will get better exercise and some bugs will be shaken out. Every time some government messes with timekeeping, a small army of programmers gets a few hours of work to adjust their systems, with some occasional unplanned downtime.
FSMLabs watched the time change closely, and their image above shows one peculiarity of reported time: Google’s NTP servers returned a time that was up to half a second off, slowly creeping out of sync with real time and then back into sync. They write in a recap, Leapsecond 2016 complete
You can see that Google starts its 20 hour leap-second “skew” 10 hours before the event and then skews back to the real time. Everyone else stays correct and then has a short error as they “leap” the second and then come back into correct time.
FSMLabs sells precision timekeeping equipment aimed at the financial industry.
Leap second bugs
The biggest issue identified was at Cloudflare, a distributed system for web cachine. Some DNS lookups causing 5xx errors due to leap second bug is the incident report.
Starting at 00:00 UTC on January 1, 2017, CNAME resolutions on some machines stopped working due to a bug triggered by the universal addition of one leap second, which affected both some authoritative DNS and origin DNS lookups, causing 5xx errors.
In the real world, the biggest reported issue that could be suspicious was reported by the BBC as London ambulance service hit by new year fault, an account of computer system crashes at London Ambulance Service (LAS) that are peculiarly synchronized with the appearance of the leap second.
It is understood the computer system crashed, so calls had to be recorded by pen and paper for nearly five hours on one of the busiest nights of the year.
Systemd is eating the world
Finally in timekeeping news and notes, the CoreOS help pages for timekeeping illustrate the importance of keeping good time within a computing cluster.
CoreOS clusters use NTP to synchronize the clocks of member nodes, and all machines start an NTP client at boot. CoreOS versions later than 681.0.0 use systemd-timesyncd(8) as the default NTP client. Earlier versions used ntpd(8). Use systemctl to check which service is running:
“Systemd is eating the world”, and one of the things it has eaten is timekeeping. One of the old CoreOS bugs, systemd-timesyncd not as precise as ntpd, reports that because of systemd-timesyncd’s less precise timekeeping there are problems with Deis and Ceph. The bug report is from 2015 and was closed due to inactivity, but if you have weird problems with CoreOS, Deis, and Ceph, check your clocks!
Notes and references
The future timekeeping bug that’s most likely to create worldwide angst on the scale of Y2K is the Year 2038 problem, which I’m counting on helping fix to fund my retirement.
TimeAndDate News has a time zone news feed, which is useful to track worldwide adjustments to and from Daylight Savings Time.
NTP Pool leap second 2017 status reported that as of a few hours before the leap second, 3463 servers in the pool system were announcing the leap second and 933 were not. Ask Bjørn Hansen maintains the pool since 2005.
Hail truechimers, exit falsetickers!