[ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

Fri Aug 9 12:40:19 EDT 2019

09.08.2019 16:34, Yan Gao пишет:
> Hi,
> 
> With disk-less sbd,  it's fine to stop cluster service from the cluster 
> nodes all at the same time.
> 
> But if to stop the nodes one by one, for example with a 3-node cluster, 
> after stopping the 2nd node, the only remaining node resets itself with:
> 

That is sort of documented in SBD manual page:

--><--
However, while the cluster is in such a degraded state, it can
neither successfully fence nor be shutdown cleanly (as taking the
cluster below the quorum threshold will immediately cause all remaining
nodes to self-fence).
--><--

SBD in shared-nothing mode is basically always in such degraded state
and cannot tolerate loss of quorum.

> Aug 09 14:30:20 opensuse150-1 sbd[1079]:       pcmk:    debug: 
> notify_parent: Not notifying parent: state transient (2)
> Aug 09 14:30:20 opensuse150-1 sbd[1080]:    cluster:    debug: 
> notify_parent: Notifying parent: healthy
> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child: 
> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
> 
> I can think of the way to manipulate quorum with last_man_standing and 
> potentially also auto_tie_breaker, not to mention 
> last_man_standing_window would also be a factor... But is there a better 
> solution?
> 

Lack of cluster wide shutdown mode was mentioned more than once on this
list. I guess the only workaround is to use higher level tools which
basically simply try to stop cluster on all nodes at once. It is still
susceptible to race condition.