[ClusterLabs] Gracefully stop nodes one by one with disk-less sbd

Fri Aug 9 15:06:32 EDT 2019

On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
> 09.08.2019 16:34, Yan Gao пишет:
>> Hi,
>>
>> With disk-less sbd,  it's fine to stop cluster service from the cluster
>> nodes all at the same time.
>>
>> But if to stop the nodes one by one, for example with a 3-node cluster,
>> after stopping the 2nd node, the only remaining node resets itself with:
>>
> 
> That is sort of documented in SBD manual page:
> 
> --><--
> However, while the cluster is in such a degraded state, it can
> neither successfully fence nor be shutdown cleanly (as taking the
> cluster below the quorum threshold will immediately cause all remaining
> nodes to self-fence).
> --><--
> 
> SBD in shared-nothing mode is basically always in such degraded state
> and cannot tolerate loss of quorum.
Well, the context here is it loses quorum *expectedly* since the other 
nodes gracefully shut down.

> 
> 
> 
>> Aug 09 14:30:20 opensuse150-1 sbd[1079]:       pcmk:    debug:
>> notify_parent: Not notifying parent: state transient (2)
>> Aug 09 14:30:20 opensuse150-1 sbd[1080]:    cluster:    debug:
>> notify_parent: Notifying parent: healthy
>> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child:
>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
>>
>> I can think of the way to manipulate quorum with last_man_standing and
>> potentially also auto_tie_breaker, not to mention
>> last_man_standing_window would also be a factor... But is there a better
>> solution?
>>
> 
> Lack of cluster wide shutdown mode was mentioned more than once on this
> list. I guess the only workaround is to use higher level tools which
> basically simply try to stop cluster on all nodes at once. It is still
> susceptible to race condition.
Gracefully stopping nodes one by one on purpose is still a reasonable 
need though ...

Regards,
   Yan