[ClusterLabs] Questions about SBD behavior

Andrei Borzenkov arvidjaar at gmail.com
Sat May 26 05:23:24 UTC 2018


25.05.2018 14:44, Klaus Wenninger пишет:
> On 05/25/2018 12:44 PM, Andrei Borzenkov wrote:
>> On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger <kwenning at redhat.com> wrote:
>>> On 05/25/2018 07:31 AM, 井上 和徳 wrote:
>>>> Hi,
>>>>
>>>> I am checking the watchdog function of SBD (without shared block-device).
>>>> In a two-node cluster, if one cluster is stopped, watchdog is triggered on the remaining node.
>>>> Is this the designed behavior?
>>> SBD without a shared block-device doesn't really make sense on
>>> a two-node cluster.
>>> The basic idea is - e.g. in a case of a networking problem -
>>> that a cluster splits up in a quorate and a non-quorate partition.
>>> The quorate partition stays over while SBD guarantees a
>>> reliable watchdog-based self-fencing of the non-quorate partition
>>> within a defined timeout.
>> Does it require no-quorum-policy=suicide or it decides completely
>> independently? I.e. would it fire also with no-quorum-policy=ignore?
> 
> Finally it will in any case. But no-quorum-policy decides how
> long this will take. In case of suicide the inquisitor will immediately
> stop tickling the watchdog. In all other cases the pacemaker-servant
> will stop pinging the inquisitor which will makes the servant
> timeout after a default of 4 seconds and then the inquisitor will
> stop tickling the watchdog.
> But that is just relevant if Corosync doesn't have 2-node enabled.
> See the comment below for that case.
> 
>>
>>> This idea of course doesn't work with just 2 nodes.
>>> Taking quorum info from the 2-node feature of corosync (automatically
>>> switching on wait-for-all) doesn't help in this case but instead
>>> would lead to split-brain.
>> So what you are saying is that SBD ignores quorum information from
>> corosync and takes its own decisions based on pure count of nodes. Do
>> I understand it correctly?
> 
> Yes, but that is just true for this case where Corosync has 2-node
> enabled.
> > In all other cases (might it be clusters with more than 2 nodes
> or clusters with just 2 nodes but without 2-node enabled in
> Corosync) pacemaker-servant takes quorum-info from
> pacemaker, which will probably come directly from Corosync
> nowadays.
> But as said if 2-node is configured with Corosync everything
> is different: The node-counting is then actually done
> by the cluster-servant and this one will stop pinging the
> inquisitor (instead of the pacemaker-servant) if it doesn't
> count more than 1 node.
> 

Is it conditional on having no shared device or it just checks two_node
value? If it always behaves this way, even with real shared device
present, it means sbd is fundamentally incompatible with two_node and it
better be mentioned in documentation.

> That all said I've just realized that setting 2-node in Corosync
> shouldn't really be dangerous anymore although it doesn't make
> the cluster especially useful either in case of SBD without disk(s).
> 
> Regards,
> Klaus


More information about the Users mailing list