[ClusterLabs] Antw: Re: Gracefully stop nodes one by one with disk-less sbd
Andrei Borzenkov
arvidjaar at gmail.com
Mon Aug 12 03:23:18 EDT 2019
Отправлено с iPhone
12 авг. 2019 г., в 9:48, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> написал(а):
>>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 09.08.2019 um 18:40 in
> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f97af at gmail.com>:
>> 09.08.2019 16:34, Yan Gao пишет:
>>> Hi,
>>>
>>> With disk-less sbd, it's fine to stop cluster service from the cluster
>>> nodes all at the same time.
>>>
>>> But if to stop the nodes one by one, for example with a 3-node cluster,
>>> after stopping the 2nd node, the only remaining node resets itself with:
>>>
>>
>> That is sort of documented in SBD manual page:
>>
>> --><--
>> However, while the cluster is in such a degraded state, it can
>> neither successfully fence nor be shutdown cleanly (as taking the
>> cluster below the quorum threshold will immediately cause all remaining
>> nodes to self-fence).
>> --><--
>>
>> SBD in shared-nothing mode is basically always in such degraded state
>> and cannot tolerate loss of quorum.
>
> So with a shared device it'S different?
Yes, as long as shared device is accessible.
> I was wondering whether
> "no-quorum-policy=freeze" would still work with the recent sbd...
>
It will with shared device.
>>
>>
>>
>>> Aug 09 14:30:20 opensuse150-1 sbd[1079]: pcmk: debug:
>>> notify_parent: Not notifying parent: state transient (2)
>>> Aug 09 14:30:20 opensuse150-1 sbd[1080]: cluster: debug:
>>> notify_parent: Notifying parent: healthy
>>> Aug 09 14:30:20 opensuse150-1 sbd[1078]: warning: inquisitor_child:
>>> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants:
> 0)
>>>
>>> I can think of the way to manipulate quorum with last_man_standing and
>>> potentially also auto_tie_breaker, not to mention
>>> last_man_standing_window would also be a factor... But is there a better
>>> solution?
>>>
>>
>> Lack of cluster wide shutdown mode was mentioned more than once on this
>> list. I guess the only workaround is to use higher level tools which
>> basically simply try to stop cluster on all nodes at once. It is still
>> susceptible to race condition.
>
> Are there any concrete plans to implement a clean solution?
>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list