[ClusterLabs] boothd-site/boothd-arbitrator: WARN: packet timestamp older than previous one

Tue Nov 7 16:26:40 CET 2017

Le mardi 07 novembre 2017 à 13:41 +0100, Dejan Muhamedagic a écrit :
> Hi,
> 
> On Mon, Nov 06, 2017 at 10:52:12AM +0100, Nicolas Huillard wrote:
> > Hello,
> > 
> > I have many of those above syslog messages from boothd (counting
> > all servers, that's nearly 1 hundred per day).
> > All sites are synchronized using NTP, but according to source
> > (https://github.com/ClusterLabs/booth/blob/master/src/transport.c),
> > that specific messages isn't event tied to maxtimeskew (which I
> > forced to 120s, because I wondered it it wrongly defaulted to 0s).
> > This message is output unless the new timestamp is larger (>) that
> > the previous one from the same site).
> 
> Right.
> 
> > Adding debug on the arbitrator, it appears that every exchange
> > between the servers is made of at least 2 messages in each
> > direction. Could it be that two consecutive messages have the exact
> > same timestamp, and thus trigger the warning?
> 
> The time resolution used should be sufficiently fine (i.e.
> microseconds) so that the timestamps of two consecutive packets
> are not the same. At least I'd expect that to be so. Now, what is
> the actual time resolution on your platform? Maybe you should
> check the clocksource? Still, if these are not some special kind
> of computing platform, I suppose that there shouldn't be any
> surprises.

My servers are regular Intel platforms ("Intel(R) Atom(TM) CPU C2750 @
2.40GHz" and "Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz"), using Debian
stretch with kernel 4.9.51.

According to boot-time logs, TSC is selected as the clocksource:
kernel: [    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
kernel: [    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
kernel: [    1.512907] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
kernel: [    1.905780] clocksource: Switched to clocksource hpet
kernel: [    1.932222] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
kernel: [    3.861842] tsc: Refined TSC clocksource calibration: 2099.998 MHz
kernel: [    3.861859] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1e452ea631d, max_idle_ns: 440795244572 ns
kernel: [    5.345808] clocksource: Switched to clocksource tsc

https://superuser.com/questions/393969/what-does-clocksource-tsc-unstab
le-mean#393978
"In short, on modern systems, the TSC sucks for measuring time
accurately" and "There is no promise that the timestamp counters of
multiple CPUs on a single motherboard will be synchronized"
Oops: this means that the timestamps sent may not be increasing,
depending on which core boothd runs...
The CPUs are currently underutilised, which can lead to increased
discrepancy between cores TSCs.
I can either:
* switch to another clocksource (I don't yet know how to do that)
* lock boothd on a specific core (I don't know if I can do that)
* ignore these messages altogether (the next one re. "timestamp older
than skew" will still happen)

> How often are these messages logged, compared to the expire (or
> renewal_freq) time?

3 nodes, 5 min expire, 2 tickets = 1728 messages per day, vs. 108
messages on all host in the last 24h = 6% fail.
Apparently, the Xeon D fail more often than the Atom C.
I wonder why this problem is not more widely experienced (not that it's
a big problem).

> Thanks,
> 
> Dejan

Thanks for your work!

-- 
Nicolas Huillard