[ClusterLabs] boothd-site/boothd-arbitrator: WARN: packet timestamp older than previous one

Sat Nov 11 04:46:08 EST 2017

On Tue, Nov 07, 2017 at 04:26:40PM +0100, Nicolas Huillard wrote:
> Le mardi 07 novembre 2017 à 13:41 +0100, Dejan Muhamedagic a écrit :
> > Hi,
> > 
> > On Mon, Nov 06, 2017 at 10:52:12AM +0100, Nicolas Huillard wrote:
> > > Hello,
> > > 
> > > I have many of those above syslog messages from boothd (counting
> > > all servers, that's nearly 1 hundred per day).
> > > All sites are synchronized using NTP, but according to source
> > > (https://github.com/ClusterLabs/booth/blob/master/src/transport.c),
> > > that specific messages isn't event tied to maxtimeskew (which I
> > > forced to 120s, because I wondered it it wrongly defaulted to 0s).
> > > This message is output unless the new timestamp is larger (>) that
> > > the previous one from the same site).
> > 
> > Right.
> > 
> > > Adding debug on the arbitrator, it appears that every exchange
> > > between the servers is made of at least 2 messages in each
> > > direction. Could it be that two consecutive messages have the exact
> > > same timestamp, and thus trigger the warning?
> > 
> > The time resolution used should be sufficiently fine (i.e.
> > microseconds) so that the timestamps of two consecutive packets
> > are not the same. At least I'd expect that to be so. Now, what is
> > the actual time resolution on your platform? Maybe you should
> > check the clocksource? Still, if these are not some special kind
> > of computing platform, I suppose that there shouldn't be any
> > surprises.
> 
> My servers are regular Intel platforms ("Intel(R) Atom(TM) CPU C2750 @
> 2.40GHz" and "Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz"), using Debian
> stretch with kernel 4.9.51.
> 
> According to boot-time logs, TSC is selected as the clocksource:
> kernel: [    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> kernel: [    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
> kernel: [    1.512907] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> kernel: [    1.905780] clocksource: Switched to clocksource hpet
> kernel: [    1.932222] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
> kernel: [    3.861842] tsc: Refined TSC clocksource calibration: 2099.998 MHz
> kernel: [    3.861859] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1e452ea631d, max_idle_ns: 440795244572 ns
> kernel: [    5.345808] clocksource: Switched to clocksource tsc
> 
> https://superuser.com/questions/393969/what-does-clocksource-tsc-unstab
> le-mean#393978
> "In short, on modern systems, the TSC sucks for measuring time
> accurately" and "There is no promise that the timestamp counters of
> multiple CPUs on a single motherboard will be synchronized"
> Oops: this means that the timestamps sent may not be increasing,
> depending on which core boothd runs...

Hmm, there is supposed to be a guarantee that the clock is
monotonic (we use CLOCK_MONOTONIC). The clock_getres(3) does say
though that it "is affected by the incremental adjustments
performed by adjtime(3) and NTP." But normally those
adjustments shouldn't exceed certain threshold per second.
Anyway, maybe you can try with hpet as a clock source.

> The CPUs are currently underutilised, which can lead to increased
> discrepancy between cores TSCs.
> I can either:
> * switch to another clocksource (I don't yet know how to do that)

You can do that with sysfs: in
/sys/devices/system/clocksource/clocksource0/ see
available_clocksource and current_clocksource. To modify it on
boot there are certainly some kernel parameters.

> * lock boothd on a specific core (I don't know if I can do that)
> * ignore these messages altogether (the next one re. "timestamp older
> than skew" will still happen)
> 
> > How often are these messages logged, compared to the expire (or
> > renewal_freq) time?
> 
> 3 nodes, 5 min expire, 2 tickets = 1728 messages per day, vs. 108
> messages on all host in the last 24h = 6% fail.
> Apparently, the Xeon D fail more often than the Atom C.
> I wonder why this problem is not more widely experienced (not that it's
> a big problem).

No idea. If you feel like experimenting, you could add a bit more
information into the debug messages (i.e. time difference) and
log all time diffs. Otherwise, though it doesn't look so to me,
maybe we do have some issue in the code, one never knows ;-)

Thanks,

Dejan

> > Thanks,
> > 
> > Dejan
> 
> Thanks for your work!
> 
> -- 
> Nicolas Huillard
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org