[pkg-ntp-maintainers] Bug#559634: Bug#559634: no kernel sync and "ntpdc -c kerninfo" reports incorrect estimates of time on Xen dom0

Sun Dec 6 01:00:23 UTC 2009

Kurt Roeckx wrote:
> On Sat, Dec 05, 2009 at 05:12:13PM -0500, Michael Bilow wrote:
>    
>> This is taking place on Xen dom0, not domU. (The name of the server --
>> the Xen host -- is "virtual1".) As far as I understand, ntp should work
>> normally on dom0.
>>      
> I don't know much about xen, but I think that's what I understood
> too.
>    
I'm pretty sure this is the case. A web search shows a lot of people 
having trouble with NTP under domU, but that's not the issue here at all.

> What I see in your bug report is:
> - It's synching to navobs1.gatech.edu
> - It has an offset of about 35 ms to the system peer
> - It claims to run at stratum 2
> - The kernel reports back abnormal values, like 0 pll offset
>    and 16 s for est and max error and that it's unsynced,
>    and a suspiciously low pll frequency.
>
> I'm guessing that you see alot of "time reset" messages in
> /var/log/daemon.log and that if you look at the output
> of ntpq -p you see the offset slowly go up until just
> after such a time reset message?
>    
It's possible, but "time reset" messages are sufficiently infrequent 
that it would be impossible to check "ntpq -p" after they occur.

> How long is ntpd running?  Does the "pll frequency" change?
> (It should change very slowly over time).
>    
It's a new server. I put it into service on 30th Nov in Providence and 
left in running until 4th Dec when it was moved to Boston for 
production. The "daemon.log" file shows between 2 and 5 "time reset" 
messages per day during the interval in Providence where ntpd was simply 
left running. After the move to Boston, the server was restarted around 
1900 EST on 4th Dec. About 8 hours later. I decided to add my usual 
suite of Stratum 1 time servers that has worked stably on other machines 
for years, and restarted ntpd with the new configuration at 0519 EST on 
5th Dec. The information submitted as part of the bug report was made 
around 1430 EST, about 9 hours later. There were three "time reset" 
messages in the log during the first 8 hours in Boston before the daemon 
was restarted, but there have been no "time reset" messages at all in 
the log since the restart of the daemon at 0519 EST and it is now a 
little more than 12 hours later.

I checked when I started this reply, and I was seeing 116.892 ppm from 
"ntpdc -c kerninfo". I just checked it again a few minutes later, and it 
is 57.596 ppm. This is not good.

There does not seem to be much consistency in the "time reset" 
corrections, some positive and some negative but never more than 25ms; 
here are all of them logged for 4th and 5th Dec:

(Providence)
Dec  4 00:39:55 virtual1 ntpd[2722]: time reset -0.189828 s
Dec  4 02:06:25 virtual1 ntpd[2722]: time reset +0.171192 s
Dec  4 05:21:01 virtual1 ntpd[2722]: time reset -0.157675 s
Dec  4 08:40:53 virtual1 ntpd[2722]: time reset +0.128741 s

(Boston)
Dec  5 00:07:29 virtual1 ntpd[3462]: time reset -0.210643 s
Dec  5 00:41:51 virtual1 ntpd[3462]: time reset -0.160555 s
Dec  5 03:58:00 virtual1 ntpd[3462]: time reset +0.145374 s
> Note that the first time ntp is started it needs to adjust
> the frequency.  This process ussually takes about 24 hours
> to find the right frequency.  It works best if there is
> no ntp.drift file when ntp is started, else it's assuming
> the value is about right and will take alot longer.  This
> also means a reboot after ntp was run for the first time
> but didn't get the frequency yet will have a negative impact.
>    
I didn't create any drift file when the server was installed, but 
allowed it to be created automatically. It's possible that the actual 
drift is so different in Providence and Boston that this is causing a 
problem, but both facilities are temperature-controlled environments 
where I would not expect significant issues. The contents of the drift 
file are 133.299 (ppm). I would be open to deleting the drift file and 
restarting the daemon if you think that would be a worthwhile test.

-- Mike