[ClusterLabs] Questions about SBD behavior

Fri May 25 11:44:56 UTC 2018

On 05/25/2018 12:44 PM, Andrei Borzenkov wrote:
> On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger <kwenning at redhat.com> wrote:
>> On 05/25/2018 07:31 AM, 井上 和徳 wrote:
>>> Hi,
>>>
>>> I am checking the watchdog function of SBD (without shared block-device).
>>> In a two-node cluster, if one cluster is stopped, watchdog is triggered on the remaining node.
>>> Is this the designed behavior?
>> SBD without a shared block-device doesn't really make sense on
>> a two-node cluster.
>> The basic idea is - e.g. in a case of a networking problem -
>> that a cluster splits up in a quorate and a non-quorate partition.
>> The quorate partition stays over while SBD guarantees a
>> reliable watchdog-based self-fencing of the non-quorate partition
>> within a defined timeout.
> Does it require no-quorum-policy=suicide or it decides completely
> independently? I.e. would it fire also with no-quorum-policy=ignore?

Finally it will in any case. But no-quorum-policy decides how
long this will take. In case of suicide the inquisitor will immediately
stop tickling the watchdog. In all other cases the pacemaker-servant
will stop pinging the inquisitor which will makes the servant
timeout after a default of 4 seconds and then the inquisitor will
stop tickling the watchdog.
But that is just relevant if Corosync doesn't have 2-node enabled.
See the comment below for that case.

>
>> This idea of course doesn't work with just 2 nodes.
>> Taking quorum info from the 2-node feature of corosync (automatically
>> switching on wait-for-all) doesn't help in this case but instead
>> would lead to split-brain.
> So what you are saying is that SBD ignores quorum information from
> corosync and takes its own decisions based on pure count of nodes. Do
> I understand it correctly?

Yes, but that is just true for this case where Corosync has 2-node
enabled.

In all other cases (might it be clusters with more than 2 nodes
or clusters with just 2 nodes but without 2-node enabled in
Corosync) pacemaker-servant takes quorum-info from
pacemaker, which will probably come directly from Corosync
nowadays.
But as said if 2-node is configured with Corosync everything
is different: The node-counting is then actually done
by the cluster-servant and this one will stop pinging the
inquisitor (instead of the pacemaker-servant) if it doesn't
count more than 1 node.

That all said I've just realized that setting 2-node in Corosync
shouldn't really be dangerous anymore although it doesn't make
the cluster especially useful either in case of SBD without disk(s).

Regards,
Klaus
>
>> What you can do - and what e.g. pcs does automatically - is enable
>> the auto-tie-breaker instead of two-node in corosync. But that
>> still doesn't give you a higher availability than the one of the
>> winner of auto-tie-breaker. (Maybe interesting if you are going
>> for a load-balancing-scenario that doesn't affect availability or
>> for a transient state while setting up a cluste node-by-node ...)
>> What you can do though is using qdevice to still have 'real-quorum'
>> info with just 2 full cluster-nodes.
>>
>> There was quite a lot of discussion round this topic on this
>> thread previously if you search the history.
>>
>> Regards,
>> Klaus
>>
>>> [vmrh75b]# cat /etc/corosync/corosync.conf
>>> (snip)
>>> quorum {
>>>     provider: corosync_votequorum
>>>     two_node: 1
>>> }
>>>
>>> [vmrh75b]# cat /etc/sysconfig/sbd
>>> # This file has been generated by pcs.
>>> SBD_DELAY_START=no
>>> ## SBD_DEVICE="/dev/vdb1"
>>> SBD_OPTS="-vvv"
>>> SBD_PACEMAKER=yes
>>> SBD_STARTMODE=always
>>> SBD_WATCHDOG_DEV=/dev/watchdog
>>> SBD_WATCHDOG_TIMEOUT=5
>>>
>>> [vmrh75b]# crm_mon -r1
>>> Stack: corosync
>>> Current DC: vmrh75a (version 2.0.0-0.1.rc4.el7-2.0.0-rc4) - partition with quorum
>>> Last updated: Fri May 25 13:36:07 2018
>>> Last change: Fri May 25 13:35:22 2018 by root via cibadmin on vmrh75a
>>>
>>> 2 nodes configured
>>> 0 resources configured
>>>
>>> Online: [ vmrh75a vmrh75b ]
>>>
>>> No resources
>>>
>>> [vmrh75b]# pcs property show
>>> Cluster Properties:
>>>  cluster-infrastructure: corosync
>>>  cluster-name: my_cluster
>>>  dc-version: 2.0.0-0.1.rc4.el7-2.0.0-rc4
>>>  have-watchdog: true
>>>  stonith-enabled: false
>>>
>>> [vmrh75b]# ps -ef | egrep "sbd|coro|pace"
>>> root      2169     1  0 13:34 ?        00:00:00 sbd: inquisitor
>>> root      2170  2169  0 13:34 ?        00:00:00 sbd: watcher: Pacemaker
>>> root      2171  2169  0 13:34 ?        00:00:00 sbd: watcher: Cluster
>>> root      2172     1  0 13:34 ?        00:00:00 corosync
>>> root      2179     1  0 13:34 ?        00:00:00 /usr/sbin/pacemakerd -f
>>> haclust+  2180  2179  0 13:34 ?        00:00:00 /usr/libexec/pacemaker/pacemaker-based
>>> root      2181  2179  0 13:34 ?        00:00:00 /usr/libexec/pacemaker/pacemaker-fenced
>>> root      2182  2179  0 13:34 ?        00:00:00 /usr/libexec/pacemaker/pacemaker-execd
>>> haclust+  2183  2179  0 13:34 ?        00:00:00 /usr/libexec/pacemaker/pacemaker-attrd
>>> haclust+  2184  2179  0 13:34 ?        00:00:00 /usr/libexec/pacemaker/pacemaker-schedulerd
>>> haclust+  2185  2179  0 13:34 ?        00:00:00 /usr/libexec/pacemaker/pacemaker-controld
>>>
>>> [vmrh75b]# pcs cluster stop vmrh75a
>>> vmrh75a: Stopping Cluster (pacemaker)...
>>> vmrh75a: Stopping Cluster (corosync)...
>>>
>>> [vmrh75b]# tail -F /var/log/messages
>>> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: Our peer on the DC (vmrh75a) is dead
>>> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition S_NOT_DC -> S_ELECTION
>>> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition S_ELECTION -> S_INTEGRATION
>>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Node vmrh75a state is now lost
>>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Removing all vmrh75a attributes for peer loss
>>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Lost attribute writer vmrh75a
>>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Purged 1 peer with id=1 and/or uname=vmrh75a from the membership cache
>>> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Node vmrh75a state is now lost
>>> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Purged 1 peer with id=1 and/or uname=vmrh75a from the membership cache
>>> May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Node vmrh75a state is now lost
>>> May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Purged 1 peer with id=1 and/or uname=vmrh75a from the membership cache
>>> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check
>>> May 25 13:37:01 vmrh75b sbd[2171]:   cluster:  warning: set_servant_health: Connected to corosync but requires both nodes present
>>> May 25 13:37:01 vmrh75b sbd[2171]:   cluster:  warning: notify_parent: Notifying parent: UNHEALTHY (6)
>>> May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: cluster health check: UNHEALTHY
>>> May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: Servant cluster is outdated (age: 226)
>>> May 25 13:37:01 vmrh75b sbd[2170]:      pcmk:   notice: unpack_config: Watchdog will be used via SBD if fencing is required
>>> May 25 13:37:01 vmrh75b sbd[2170]:      pcmk:     info: determine_online_status: Node vmrh75b is online
>>> May 25 13:37:01 vmrh75b sbd[2170]:      pcmk:     info: unpack_node_loop: Node 2 is already processed
>>> May 25 13:37:01 vmrh75b sbd[2170]:      pcmk:     info: unpack_node_loop: Node 2 is already processed
>>> May 25 13:37:01 vmrh75b sbd[2171]:   cluster:  warning: notify_parent: Notifying parent: UNHEALTHY (6)
>>> May 25 13:37:01 vmrh75b corosync[2172]: [TOTEM ] A new membership (192.168.28.132:5712) was formed. Members left: 1
>>> May 25 13:37:01 vmrh75b corosync[2172]: [QUORUM] Members[1]: 2
>>> May 25 13:37:01 vmrh75b corosync[2172]: [MAIN  ] Completed service synchronization, ready to provide service.
>>> May 25 13:37:01 vmrh75b pacemakerd[2179]: notice: Node vmrh75a state is now lost
>>> May 25 13:37:01 vmrh75b pacemaker-controld[2185]: notice: Node vmrh75a state is now lost
>>> May 25 13:37:01 vmrh75b pacemaker-controld[2185]: warning: Stonith/shutdown of node vmrh75a was not expected
>>> May 25 13:37:02 vmrh75b sbd[2171]:   cluster:  warning: notify_parent: Notifying parent: UNHEALTHY (6)
>>> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Watchdog will be used via SBD if fencing is required
>>> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: warning: Blind faith: not fencing unseen nodes
>>> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Delaying fencing operations until there are resources to manage
>>> May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-1410.bz2
>>> May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: Transition 0 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1410.bz2): Complete
>>> May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
>>> May 25 13:37:03 vmrh75b sbd[2171]:   cluster:  warning: notify_parent: Notifying parent: UNHEALTHY (6)
>>> May 25 13:37:03 vmrh75b sbd[2170]:      pcmk:   notice: unpack_config: Watchdog will be used via SBD if fencing is required
>>> May 25 13:37:03 vmrh75b sbd[2170]:      pcmk:     info: determine_online_status: Node vmrh75b is online
>>> May 25 13:37:03 vmrh75b sbd[2170]:      pcmk:     info: unpack_node_loop: Node 2 is already processed
>>> May 25 13:37:03 vmrh75b sbd[2170]:      pcmk:     info: unpack_node_loop: Node 2 is already processed
>>> May 25 13:37:04 vmrh75b sbd[2171]:   cluster:  warning: notify_parent: Notifying parent: UNHEALTHY (6)
>>> May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
>>> May 25 13:37:05 vmrh75b sbd[2171]:   cluster:  warning: notify_parent: Notifying parent: UNHEALTHY (6)
>>> May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
>>>
>>> Best Regards,
>>> Kazunori INOUE
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org