[ClusterLabs] pacemaker with sbd fails to start if node reboots too fast.
Gao,Yan
ygao at suse.com
Thu Nov 30 05:48:57 EST 2017
On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
> VM on VSphere using shared VMDK as SBD. During basic tests by killing
> corosync and forcing STONITH pacemaker was not started after reboot.
> In logs I see during boot
>
> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
> just fenced by sapprod01p for sapprod01p
> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: warning: The crmd
> process (3151) can no longer be respawned,
> Nov 22 16:04:56 sapprod01s pacemakerd[3137]: notice: Shutting down Pacemaker
>
> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
> stonith with SBD always takes msgwait (at least, visually host is not
> declared as OFFLINE until 120s passed). But VM rebots lightning fast
> and is up and running long before timeout expires.
>
> I think I have seen similar report already. Is it something that can
> be fixed by SBD/pacemaker tuning?
SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.
Regards,
Yan
>
> I can provide full logs tomorrow if needed.
>
> TIA
>
> -andrei
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
More information about the Users
mailing list