[ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

Tue Dec 5 13:58:09 UTC 2017

>>> Dejan Muhamedagic <dejanmm at fastmail.fm> schrieb am 05.12.2017 um 08:57 in
Nachricht <20171205075703.dif52t5ncchdgvi6 at tuttle.homenet>:
> On Mon, Dec 04, 2017 at 09:55:46PM +0300, Andrei Borzenkov wrote:
>> 04.12.2017 14:48, Gao,Yan пишет:
>> > On 12/02/2017 07:19 PM, Andrei Borzenkov wrote:
>> >> 30.11.2017 13:48, Gao,Yan пишет:
>> >>> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>> >>>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> >>>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> >>>> corosync and forcing STONITH pacemaker was not started after reboot.
>> >>>> In logs I see during boot
>> >>>>
>> >>>> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
>> >>>> just fenced by sapprod01p for sapprod01p
>> >>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> >>>> process (3151) can no longer be respawned,
>> >>>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
>> >>>> Pacemaker
>> >>>>
>> >>>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> >>>> stonith with SBD always takes msgwait (at least, visually host is not
>> >>>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> >>>> and is up and running long before timeout expires.
>> >>>>
>> >>>> I think I have seen similar report already. Is it something that can
>> >>>> be fixed by SBD/pacemaker tuning?
>> >>> SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
>> >>>
>> >>
>> >> I tried it (on openSUSE Tumbleweed which is what I have at hand, it has
>> >> SBD 1.3.0) and with SBD_DELAY_START=yes sbd does not appear to watch
>> >> disk at all. 
>> > It simply waits that long on startup before starting the rest of the
>> > cluster stack to make sure the fencing that targeted it has returned. It
>> > intentionally doesn't watch anything during this period of time.
>> > 
>> 
>> Unfortunately it waits too long.
>> 
>> ha1:~ # systemctl status sbd.service
>> ● sbd.service - Shared-storage based fencing daemon
>>    Loaded: loaded (/usr/lib/systemd/system/sbd.service; enabled; vendor
>> preset: disabled)
>>    Active: failed (Result: timeout) since Mon 2017-12-04 21:47:03 MSK;
>> 4min 16s ago
>>   Process: 1861 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited,
>> status=0/SUCCESS)
>>   Process: 2058 ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid
>> watch (code=killed, signa
>>  Main PID: 1792 (code=exited, status=0/SUCCESS)
>> 
>> дек 04 21:45:32 ha1 systemd[1]: Starting Shared-storage based fencing
>> daemon...
>> дек 04 21:47:02 ha1 systemd[1]: sbd.service: Start operation timed out.
>> Terminating.
>> дек 04 21:47:03 ha1 systemd[1]: Failed to start Shared-storage based
>> fencing daemon.
>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Unit entered failed state.
>> дек 04 21:47:03 ha1 systemd[1]: sbd.service: Failed with result 'timeout'.
>> 
>> But the real problem is - in spite of SBD failed to start, the whole
>> cluster stack continues to run; and because SBD blindly trusts in well
>> behaving nodes, fencing appears to succeed after timeout ... without
>> anyone taking any action on poison pill ...

In SLES at least, corosync doesn't start up if SBD was configured, but
failed.

> 
> That's something I always wondered about: if a node is capable of
> reading a poison pill then it could before shutdown also write an
> "I'm leaving" message into its slot. Wouldn't that make sbd more
> reliable? Any reason not to implement that?

It would be like "trigger the watchdog to kill us, then write the exit
message, hoping it will arrive before the watchdog kills us". If done the other
way 'round a remote node could assume successful fencing when in fact it didn't
complete yet. A race condition.

> 
> Thanks,
> 
> Dejan
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org