[ClusterLabs] Antw: Growing a cluster from 1 node without fencing

Mon Aug 14 13:12:25 UTC 2017

On 14/08/17 13:46, Klaus Wenninger wrote:
 > How does your /etc/sysconfig/sbd look like?
 > With just that pcs-command you get some default-config with
 > watchdog-only-support.

It currently looks like this:

SBD_DELAY_START=no
SBD_OPTS="-n cluster1"
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_WATCHDOG_DEV=/dev/watchdog
SBD_WATCHDOG_TIMEOUT=5

 > Without cluster-property stonith-watchdog-timeout set to a
 > value matching (twice is a good choice) the watchdog-timeout
 > configured in /etc/sysconfig/sbd (default = 5s) a node will never
 > assume the unseen partner as fenced.
 > Anyway watchdog-only-sbd is of very limited use in 2-node
 > scenarios. Kind of limits the availability to the one of the node
 > that would win the tie_breaker-game. But might still be useful
 > in certain scenarios of course. (like load-sharing ...)

Good point.

> On 08/14/2017 12:20 PM, Ulrich Windl wrote:
>> Hi!
>>
>> Have you tried studying the logs? Usually you get useful information from
>> there (to share!).

Here is journalctl and pacemaker.log output:

Aug 14 08:57:26 cluster1 crmd[2221]:   notice: Result of start operation 
for dlm on cluster1: 0 (ok)
Aug 14 08:57:26 cluster1 sbd[2202]:       pcmk:     info: 
set_servant_health: Node state: online
Aug 14 08:57:26 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:26 cluster1 sbd[2199]:   notice: inquisitor_child: Servant 
pcmk is healthy (age: 0)
Aug 14 08:57:26 cluster1 sbd[2199]:   notice: inquisitor_child: Active 
cluster detected
Aug 14 08:57:26 cluster1 crmd[2221]:   notice: Initiating monitor 
operation dlm:0_monitor_30000 locally on cluster1
Aug 14 08:57:26 cluster1 crmd[2221]:   notice: Transition 0 (Complete=5, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-44.bz2): Complete
Aug 14 08:57:26 cluster1 crmd[2221]:   notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE
Aug 14 08:57:27 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:27 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:28 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:28 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:28 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:29 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:29 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:30 cluster1 corosync[2208]:  [CFG   ] Config reload 
requested by node 1
Aug 14 08:57:30 cluster1 corosync[2208]:  [TOTEM ] adding new UDPU 
member {10.71.77.147}
Aug 14 08:57:30 cluster1 corosync[2208]:  [QUORUM] This node is within 
the non-primary component and will NOT provide any services.
Aug 14 08:57:30 cluster1 corosync[2208]:  [QUORUM] Members[1]: 1
Aug 14 08:57:30 cluster1 crmd[2221]:  warning: Quorum lost
Aug 14 08:57:30 cluster1 pacemakerd[2215]:  warning: Quorum lost

^^^^^^^^^ Looks unexpected

Aug 14 08:57:30 cluster1 sbd[2202]:       pcmk:     info: 
set_servant_health: Quorum lost: Ignore
Aug 14 08:57:30 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:30 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:30 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:31 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:31 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:32 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:32 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:32 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:33 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:33 cluster1 sbd[2199]:  warning: inquisitor_child: Servant 
pcmk is outdated (age: 4)
Aug 14 08:57:33 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:34 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:34 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:35 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:35 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)
Aug 14 08:57:36 cluster1 sbd[2203]:    cluster:     info: notify_parent: 
Notifying parent: healthy
Aug 14 08:57:36 cluster1 sbd[2199]:  warning: inquisitor_child: Latency: 
No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
Aug 14 08:57:36 cluster1 sbd[2202]:       pcmk:     info: notify_parent: 
Not notifying parent: state transient (2)

Thanks,
--Edwin