5.2. The Kernel Discipline

In addition to the NTP protocol specification there also exists a description for a kernel clock model ([RFC 1589]) that is discussed here.

1. Basic Functionality
5.2.1.1. What is special about the Kernel Clock?
5.2.1.2. Does my Operating System have the Kernel Discipline?
5.2.1.3. How can I verify the Kernel Discipline?
2. Alternatives
5.2.2.1. Should the Kernel Discipline be used?
3. Monitoring
5.2.3.1. What are the individual monitoring values about?
4. PPS Processing
5.2.4.1. What is PPS Processing?
5.2.4.2. How is PPS Processing related to the Kernel Discipline?
5.2.4.3. What does hardpps() do?

1. Basic Functionality

5.2.1.1. What is special about the Kernel Clock?

NTP keeps precision time by applying small adjustments to to system clock periodically (See also Q: 5.1.6.1.). However, some clock implementations do not allow small corrections to be applied to the system clock, and there is no standard interface to monitor the system clock's quality.

Therefore a new clock model is suggested that has the following features (See also [RFC 1589]):

  • Two new system calls to query and control the clock: ntp_gettime() and ntp_adjtime()

  • The clock keeps time with a precision of one microsecond.[1] In real life operating systems there are clocks that are much worse.

  • Time can be corrected in quantities of one microsecond, and repetitive corrections accumulate.[1] The UNIX system call adjtime() does not accumulate successive corrections.

  • The clock model maintains additional parameters that can be queried or controlled. Among these are:

    • A clock synchronization status that shows the state of the clock machinery (e.g. TIME_OK).
    • Several clock control and status bits that control and show the state of the machinery (e.g. STA_PLL). This includes automatic handling of leap seconds (when announced).
    • Correction values for clock offset and frequency that are automatically applied.
    • Other control and monitoring values like precision, estimated error and frequency tolerance.
  • Corrections to the clock can be automatically maintained and applied.

Applying corrections automatically within the operating system kernel does no longer require periodic corrections through an application program. Unfortunately there exist several revisions of the clock model that are partly incompatible. See Section 6.4.1.

5.2.1.2. Does my Operating System have the Kernel Discipline?

If you can find an include file named timex.h that contains a structure named timex and constants like STA_PLL and STA_UNSYNC, you probably have the kernel discipline implemented. To make sure, try using the ntp_gettime() system call.

5.2.1.3. How can I verify the Kernel Discipline?

The following guidelines were presented by Professor David L. Mills:

Feedback loops and in particular phase-lock loops and I go way, way back since the first time I built one as part of a frequency synthesizer project as a grad student in 1959 no less. All the theory I could dredge up then convinced me they were evil, nonlinear things and tamed only by experiment, breadboard and cut-and-try. Not so now, of course, but the cut-and-try still lives on. The essential lessons I learned back then and have forgotten and relearned every ten years or so are:

  1. Carefully calibrate the frequency to the control voltage and never forget it.

  2. Don't try to improve performance by cranking up the gain beyond the phase crossover.

  3. Keep the loop delay much smaller than the time constant.

  4. For the first couple of decade re-learns, the critters were analog and with short time constants so I could watch it with a scope. The last couple of re-learns were digital with time constants of days. So, another lesson:

    There is nothing in an analog loop that can't be done in a digital loop except debug it with a pair of headphones and a good test oscillator. Yes, I did say headphones.

So, this nonsense leads me to a couple of simple experiments:

  1. First, open the loop (kill ntpd). Using ntptime,; zero the frequency and offset. Measure the frequency offset, which could take a day.

  2. Then, do the same thing with a known offset via ntptime of say 50 PPM. You now have really and truly calibrated the VFO gain.

  3. Next, close the loop after forcing the local clock maybe 100 ms offset. Watch the offset-time characteristic. Make sure it crosses zero in about 3000 s and overshoots about 5 percent. That with a time constant of 6 in the current nanokernel.

In very simple words, step 1 means that you measure the error of your clock without any correction. You should see a linear increase for the offset. step 2 says you should then try a correction with a fixed offset. Finally, step 3 applies corrections using varying frequency corrections.

2. Alternatives

5.2.2.1. Should the Kernel Discipline be used?

Despite of the features described in Q: 5.2.1.1. there are reasons to disable the use of the kernel discipline. Especially for very long polling intervals (see also Q: 5.1.5.1.) there are disadvantages with the kernel discipline designed for NTP version 3. Professor David L. Mills said:

The key to the daemon loop performance is the use of the Allan intercept to weight the PLL/FLL contributions. The result is to weight the FLL contributions more heavily with the longer poll intervals. However, the effects are noticeable mostly in the transition region between 256 s and the Allan intercept, which is dynamically estimated as a function of phase noise. All this could be implemented in the kernel discipline, but it doesn't seem worthwhile in view of the very mintor performance that could be achieved. The correct advice in these cases is to avoid the kernel loop entirely if you expect to allow intervals much over 1024 s. (...)

Basically it means that ntpd performs more complex computations than the kernel clock does. Floating point operations are generally avoided in operating system kernels. As mentioned in Q: 5.1.5.2., there's a polling interval where the total error is minimal. This is what is called Allan intercept above.

In NTP version 3 that point was hardcoded as 1024 seconds. For shorter polling intervals PLL mode was used, while for longer intervals FLL mode was used. NTP version 4 has a mixed model where PLL and FLL both contribute to the estimated correction value. However, this does not mean that the older kernel code fails; I successfully ran a standard Linux kernel with maxpoll 17, and the polling interval actually reached 36 hours.[2]

3. Monitoring

5.2.3.1. What are the individual monitoring values about?

Most of the values are described in Q: 6.2.4.2.1.. The remaining values of interest are:

time

The current time.

maxerror

The maximum error (set by an application program, increases automatically).

esterror

The estimated error (set by an application program like ntpd).

offset

The additional remaining correction to the system clock.

freq

The automatic periodic correction to the system clock. Positive values make the clock go faster while negative values slow it down.

constant

Stiffness of the control loop. This value controls how a correction to the system clock is weighted. Large values cause only small corrections to be made.

status

The set of control bits in effect. Some bits can only be read, while others can be also set by a privileged application. The most important bits are:

STA_PLL

The PLL (Phase Locked Loop) is enabled. Automatic corrections are applied only if this flag is set.

STA_FLL

The FLL (Frequency Locked Loop) is enabled. This flag is set when the time offset is not believed to be good. Usually this is the case for long sampling intervals or after a bad sample has been detected by xntpd.

STA_UNSYNC

The system time is not synchronized. This flag is usually controlled by an application program, but the operating system may also set it.

STA_FREQHOLD

This flag disables updates to the freq component. The flag is usually set during initial synchronization.

4. PPS Processing

5.2.4.1. What is PPS Processing?

During normal time synchronization, the time stamps of some server are compared about every 20 minutes to compute the required corrections for frequency and offset. With PPS processing, a similar thing is done every second. Therefore it's just time synchronization on a smaller scale. The idea is to keep the system clock tightly coupled with the external reference clock providing the PPS signal.

5.2.4.2. How is PPS Processing related to the Kernel Discipline?

PPS processing can be done in application programs (see also Q: 6.2.4.5.1.), but it makes much more sense when done in the operating system kernel. When polling a time source every 20 minutes, an offset of 5ms is rather small, but when polling a signal every second, an offset of 5ms is very high. Therefore a high accuracy is required for PPS processing. Application programs usually can't fulfil these demands.

The kernel clock model described before also includes algorithms to discipline the clock through an external pulse, the PPS. The additional requirements consist of two mechanisms: Capturing an external event with high accuracy, and applying that event to the clock model. The first is nowadays solved by using the PPS API (Q: 6.2.4.5.1.), while the second is implemented mostly in a routine named hardpps(). The latter routine is called every time when an external PPS event has been detected.

5.2.4.3. What does hardpps() do?

hardpps() is called with two parameters, the absolute time of the event, and the time relative to the last pulse. Both times are measured by the system clock.

The first value is used to minimize the difference between the system clock's start of a second and the external event, while the second value is used to minimize the difference in clock frequency. Normally hardpps() just monitors (e.g. STA_PPSSIGNAL, PPS frequency, stability and jitter) the external events, but does not apply corrections to the system clock.

Figure 4. PPS Synchronization

hardpps() can minimize the differences of both, frequency and offset between the system clock and an external reference.

Flag STA_PPSFREQ enables periodic updates to the clock's frequency correction. Stable clocks require only small and infrequent updates while bad clocks require frequent and large updates. The value passed as parameter is reduced to be a small value around zero, and then it is added to an accumulated value. After a specific amount of values has been added (at the end of a calibration interval), the total amount is divided by the length of the calibration interval, giving a new frequency correction.

When flag STA_PPSTIME is set, the start of a second is moved towards the PPS event, reducing the needed offset correction. The time offset given as argument to the routine will be put into a three-stage median filter to reduce spikes and to compute the jitter. Then an averaged value is applied as offset correction.

In addition to these direct manipulations, hardpps() also detects, signals, and filters various error conditions. The length of the calibration interval is also adjusted automatically. As the limit for a bad calibration is ridiculously high (about 500 PPM per calibration), the calibration interval normally is always at its configured maximum.

Notes

[1]

The latest proposal, known as nanokernel, keeps time using even fractional nanoseconds.

[2]

I used Linux-2.2.13. Unfortunately the maxerror reached 16 seconds, and the implementation turned on STA_UNSYNC (The reference implementation grows maxerror very fast, but it does not set STA_UNSYNC when reaching some limit). IMHO if the maximum error is that high, ntpd should lower the polling interval.