[ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

Andrei Borzenkov arvidjaar at gmail.com
Fri Aug 9 12:40:19 EDT 2019


09.08.2019 16:34, Yan Gao пишет:
> Hi,
> 
> With disk-less sbd,  it's fine to stop cluster service from the cluster 
> nodes all at the same time.
> 
> But if to stop the nodes one by one, for example with a 3-node cluster, 
> after stopping the 2nd node, the only remaining node resets itself with:
> 

That is sort of documented in SBD manual page:

--><--
However, while the cluster is in such a degraded state, it can
neither successfully fence nor be shutdown cleanly (as taking the
cluster below the quorum threshold will immediately cause all remaining
nodes to self-fence).
--><--

SBD in shared-nothing mode is basically always in such degraded state
and cannot tolerate loss of quorum.



> Aug 09 14:30:20 opensuse150-1 sbd[1079]:       pcmk:    debug: 
> notify_parent: Not notifying parent: state transient (2)
> Aug 09 14:30:20 opensuse150-1 sbd[1080]:    cluster:    debug: 
> notify_parent: Notifying parent: healthy
> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child: 
> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
> 
> I can think of the way to manipulate quorum with last_man_standing and 
> potentially also auto_tie_breaker, not to mention 
> last_man_standing_window would also be a factor... But is there a better 
> solution?
> 

Lack of cluster wide shutdown mode was mentioned more than once on this
list. I guess the only workaround is to use higher level tools which
basically simply try to stop cluster on all nodes at once. It is still
susceptible to race condition.


More information about the Users mailing list