[ClusterLabs] Questions about SBD behavior
井上 和徳
inouekazu at intellilink.co.jp
Fri May 25 01:31:32 EDT 2018
Hi,
I am checking the watchdog function of SBD (without shared block-device).
In a two-node cluster, if one cluster is stopped, watchdog is triggered on the remaining node.
Is this the designed behavior?
[vmrh75b]# cat /etc/corosync/corosync.conf
(snip)
quorum {
provider: corosync_votequorum
two_node: 1
}
[vmrh75b]# cat /etc/sysconfig/sbd
# This file has been generated by pcs.
SBD_DELAY_START=no
## SBD_DEVICE="/dev/vdb1"
SBD_OPTS="-vvv"
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=5
[vmrh75b]# crm_mon -r1
Stack: corosync
Current DC: vmrh75a (version 2.0.0-0.1.rc4.el7-2.0.0-rc4) - partition with quorum
Last updated: Fri May 25 13:36:07 2018
Last change: Fri May 25 13:35:22 2018 by root via cibadmin on vmrh75a
2 nodes configured
0 resources configured
Online: [ vmrh75a vmrh75b ]
No resources
[vmrh75b]# pcs property show
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: my_cluster
dc-version: 2.0.0-0.1.rc4.el7-2.0.0-rc4
have-watchdog: true
stonith-enabled: false
[vmrh75b]# ps -ef | egrep "sbd|coro|pace"
root 2169 1 0 13:34 ? 00:00:00 sbd: inquisitor
root 2170 2169 0 13:34 ? 00:00:00 sbd: watcher: Pacemaker
root 2171 2169 0 13:34 ? 00:00:00 sbd: watcher: Cluster
root 2172 1 0 13:34 ? 00:00:00 corosync
root 2179 1 0 13:34 ? 00:00:00 /usr/sbin/pacemakerd -f
haclust+ 2180 2179 0 13:34 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-based
root 2181 2179 0 13:34 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-fenced
root 2182 2179 0 13:34 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-execd
haclust+ 2183 2179 0 13:34 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-attrd
haclust+ 2184 2179 0 13:34 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-schedulerd
haclust+ 2185 2179 0 13:34 ? 00:00:00 /usr/libexec/pacemaker/pacemaker-controld
[vmrh75b]# pcs cluster stop vmrh75a
vmrh75a: Stopping Cluster (pacemaker)...
vmrh75a: Stopping Cluster (corosync)...
[vmrh75b]# tail -F /var/log/messages
May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: Our peer on the DC (vmrh75a) is dead
May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition S_NOT_DC -> S_ELECTION
May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition S_ELECTION -> S_INTEGRATION
May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Node vmrh75a state is now lost
May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Removing all vmrh75a attributes for peer loss
May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Lost attribute writer vmrh75a
May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Purged 1 peer with id=1 and/or uname=vmrh75a from the membership cache
May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Node vmrh75a state is now lost
May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Purged 1 peer with id=1 and/or uname=vmrh75a from the membership cache
May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Node vmrh75a state is now lost
May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Purged 1 peer with id=1 and/or uname=vmrh75a from the membership cache
May 25 13:37:00 vmrh75b pacemaker-controld[2185]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check
May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: set_servant_health: Connected to corosync but requires both nodes present
May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: notify_parent: Notifying parent: UNHEALTHY (6)
May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: cluster health check: UNHEALTHY
May 25 13:37:01 vmrh75b sbd[2169]: warning: inquisitor_child: Servant cluster is outdated (age: 226)
May 25 13:37:01 vmrh75b sbd[2170]: pcmk: notice: unpack_config: Watchdog will be used via SBD if fencing is required
May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: determine_online_status: Node vmrh75b is online
May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: Node 2 is already processed
May 25 13:37:01 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: Node 2 is already processed
May 25 13:37:01 vmrh75b sbd[2171]: cluster: warning: notify_parent: Notifying parent: UNHEALTHY (6)
May 25 13:37:01 vmrh75b corosync[2172]: [TOTEM ] A new membership (192.168.28.132:5712) was formed. Members left: 1
May 25 13:37:01 vmrh75b corosync[2172]: [QUORUM] Members[1]: 2
May 25 13:37:01 vmrh75b corosync[2172]: [MAIN ] Completed service synchronization, ready to provide service.
May 25 13:37:01 vmrh75b pacemakerd[2179]: notice: Node vmrh75a state is now lost
May 25 13:37:01 vmrh75b pacemaker-controld[2185]: notice: Node vmrh75a state is now lost
May 25 13:37:01 vmrh75b pacemaker-controld[2185]: warning: Stonith/shutdown of node vmrh75a was not expected
May 25 13:37:02 vmrh75b sbd[2171]: cluster: warning: notify_parent: Notifying parent: UNHEALTHY (6)
May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Watchdog will be used via SBD if fencing is required
May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: warning: Blind faith: not fencing unseen nodes
May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Delaying fencing operations until there are resources to manage
May 25 13:37:02 vmrh75b pacemaker-schedulerd[2184]: notice: Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-1410.bz2
May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: Transition 0 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1410.bz2): Complete
May 25 13:37:02 vmrh75b pacemaker-controld[2185]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
May 25 13:37:03 vmrh75b sbd[2171]: cluster: warning: notify_parent: Notifying parent: UNHEALTHY (6)
May 25 13:37:03 vmrh75b sbd[2170]: pcmk: notice: unpack_config: Watchdog will be used via SBD if fencing is required
May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: determine_online_status: Node vmrh75b is online
May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: Node 2 is already processed
May 25 13:37:03 vmrh75b sbd[2170]: pcmk: info: unpack_node_loop: Node 2 is already processed
May 25 13:37:04 vmrh75b sbd[2171]: cluster: warning: notify_parent: Notifying parent: UNHEALTHY (6)
May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
May 25 13:37:05 vmrh75b sbd[2171]: cluster: warning: notify_parent: Notifying parent: UNHEALTHY (6)
May 25 13:37:05 vmrh75b sbd[2169]: warning: inquisitor_child: Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
Best Regards,
Kazunori INOUE
More information about the Users
mailing list