[ClusterLabs] SDB msgwait & partner reboot time
Jorge Fábregas
jorge.fabregas at gmail.com
Tue Sep 8 15:45:32 UTC 2015
Hi,
I've read about how important is the relationship between the different
parameters of the SBD device (msgwait & watchdog timeout) & Pacemaker's
stonith timeout. However I've just encountered something that I never
considered: the time elapsed until a node is fully up (after being
fenced) against msgwait.
Two nodes: sles11a & sles11b. I fenced sles11a (via Hawk's interface
that triggers the sbd resource agent) and watched carefully
/var/log/messages on sles11b:
Sept 8 11:27:00 sles11b sbd: Writing reset to node slot sles11a
Sept 8 11:27:00 sles11b sbd: Messaging delay: 40
[sles11a is rebooting and it comes up in about 12 seconds]
[see a bunch of messages joining the cluster]
[finally node sles11a is online at about 11:27:25]
Sept 8 11:27:40 sles11b sbd: Message successfully delivered
[sles11a is put offline!]
Sept 8 11:27:41 pengine[4358]: warning: custom_action: Action
p_stonith-sdb_monitor_0 on sles11a
is unrunnable (pending)
I've done it about 5 times and it happens every time.
My values are: 20 (watchdog timeout) & 40 (msgwait). I know I
know..it's too much for my lab environment but I'm just curious if
there's something wrong or if indeed msgwait NEEDS to be ALWAYS less
than reboot-time.
Thanks,
Jorge
More information about the Users
mailing list