[ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd

Mon Aug 12 03:23:18 EDT 2019

Отправлено с iPhone

12 авг. 2019 г., в 9:48, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> написал(а):

>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 09.08.2019 um 18:40 in
> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f97af at gmail.com>:
>> 09.08.2019 16:34, Yan Gao пишет:
>>> Hi,
>>> 
>>> With disk-less sbd,  it's fine to stop cluster service from the cluster 
>>> nodes all at the same time.
>>> 
>>> But if to stop the nodes one by one, for example with a 3-node cluster, 
>>> after stopping the 2nd node, the only remaining node resets itself with:
>>> 
>> 
>> That is sort of documented in SBD manual page:
>> 
>> --><--
>> However, while the cluster is in such a degraded state, it can
>> neither successfully fence nor be shutdown cleanly (as taking the
>> cluster below the quorum threshold will immediately cause all remaining
>> nodes to self-fence).
>> --><--
>> 
>> SBD in shared-nothing mode is basically always in such degraded state
>> and cannot tolerate loss of quorum.
> 
> So with a shared device it'S different?

Yes, as long as shared device is accessible.

> I was wondering whether
> "no-quorum-policy=freeze" would still work with the recent sbd...
> 

It will with shared device.

>> 
>> 
>> 
>>> Aug 09 14:30:20 opensuse150-1 sbd[1079]:       pcmk:    debug: 
>>> notify_parent: Not notifying parent: state transient (2)
>>> Aug 09 14:30:20 opensuse150-1 sbd[1080]:    cluster:    debug: 
>>> notify_parent: Notifying parent: healthy
>>> Aug 09 14:30:20 opensuse150-1 sbd[1078]:  warning: inquisitor_child: 
>>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants:
> 0)
>>> 
>>> I can think of the way to manipulate quorum with last_man_standing and 
>>> potentially also auto_tie_breaker, not to mention 
>>> last_man_standing_window would also be a factor... But is there a better 
>>> solution?
>>> 
>> 
>> Lack of cluster wide shutdown mode was mentioned more than once on this
>> list. I guess the only workaround is to use higher level tools which
>> basically simply try to stop cluster on all nodes at once. It is still
>> susceptible to race condition.
> 
> Are there any concrete plans to implement a clean solution?
> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/