[ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.

Thu Nov 30 05:48:57 EST 2017

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
> VM on VSphere using shared VMDK as SBD. During basic tests by killing
> corosync and forcing STONITH pacemaker was not started after reboot.
> In logs I see during boot
> 
> Nov 22 16:04:56 sapprod01s crmd[3151]:     crit: We were allegedly
> just fenced by sapprod01p for sapprod01p
> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
> process (3151) can no longer be respawned,
> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down Pacemaker
> 
> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
> stonith with SBD always takes msgwait (at least, visually host is not
> declared as OFFLINE until 120s passed). But VM rebots lightning fast
> and is up and running long before timeout expires.
> 
> I think I have seen similar report already. Is it something that can
> be fixed by SBD/pacemaker tuning?
SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.

Regards,
   Yan

> 
> I can provide full logs tomorrow if needed.
> 
> TIA
> 
> -andrei
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
>