[ClusterLabs] Questions about SBD behavior

Wed Jun 13 04:58:46 EDT 2018

Thanks for the response.

As of v1.3.1 and later, I recognized that real quorum is necessary.
I also read this:
https://wiki.clusterlabs.org/wiki/Using_SBD_with_Pacemaker#Watchdog-based_self-fencing_with_resource_recovery

As related to this specification, in order to use pacemaker-2.0,
we are confirming the following known issue.

* When SIGSTOP is sent to the pacemaker process, no failure of the
  resource will be detected.
  https://lists.clusterlabs.org/pipermail/users/2016-September/011146.html
  https://lists.clusterlabs.org/pipermail/users/2016-October/011429.html

  I expected that it was being handled by SBD, but no one detected
  that the following process was frozen. Therefore, no failure of
  the resource was detected either.
  - pacemaker-based
  - pacemaker-execd
  - pacemaker-attrd
  - pacemaker-schedulerd
  - pacemaker-controld

  I confirmed this, but I couldn't read about the correspondence
  situation.
  https://wiki.clusterlabs.org/w/images/1/1a/Recent_Work_and_Future_Plans_for_SBD_1.1.pdf

As a result of our discussion, we want SBD to detect it and reset the
machine.

Also, for users who do not have shared disk or qdevice,
we need an option to work even without real quorum.
(fence races are going to avoid with delay attribute:
 https://access.redhat.com/solutions/91653
 https://access.redhat.com/solutions/1293523)

Best Regards,
Kazunori INOUE

> -----Original Message-----
> From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Klaus Wenninger
> Sent: Friday, May 25, 2018 4:08 PM
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Questions about SBD behavior
> 
> On 05/25/2018 07:31 AM, 井上 和徳 wrote:
> > Hi,
> >
> > I am checking the watchdog function of SBD (without shared block-device).
> > In a two-node cluster, if one cluster is stopped, watchdog is triggered on the
> remaining node.
> > Is this the designed behavior?
> 
> SBD without a shared block-device doesn't really make sense on
> a two-node cluster.
> The basic idea is - e.g. in a case of a networking problem -
> that a cluster splits up in a quorate and a non-quorate partition.
> The quorate partition stays over while SBD guarantees a
> reliable watchdog-based self-fencing of the non-quorate partition
> within a defined timeout.
> This idea of course doesn't work with just 2 nodes.
> Taking quorum info from the 2-node feature of corosync (automatically
> switching on wait-for-all) doesn't help in this case but instead
> would lead to split-brain.
> What you can do - and what e.g. pcs does automatically - is enable
> the auto-tie-breaker instead of two-node in corosync. But that
> still doesn't give you a higher availability than the one of the
> winner of auto-tie-breaker. (Maybe interesting if you are going
> for a load-balancing-scenario that doesn't affect availability or
> for a transient state while setting up a cluste node-by-node ...)
> What you can do though is using qdevice to still have 'real-quorum'
> info with just 2 full cluster-nodes.
> 
> There was quite a lot of discussion round this topic on this
> thread previously if you search the history.
> 
> Regards,
> Klaus