[ClusterLabs] Antw: [EXT] Re: Coming in Pacemaker 2.1.3: multiple‑active=stop_unexpected

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Apr 11 02:20:13 EDT 2022


>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 09.04.2022 um 06:48 in
Nachricht <30178b34-d2fd-1af4-58ed-d9d2aa6e6e36 at gmail.com>:
> On 08.04.2022 20:16, Ken Gaillot wrote:
>> Hi all,
>> 
>> I'm hoping to have the first release candidate for Pacemaker 2.1.3
>> available in a couple of weeks.
>> 
>> One of the new features will be a new possible value for the "multiple‑
>> active" resource meta‑attribute, which specifies how the cluster should
>> react if multiple instances of a resource are detected to be active
>> when only one should be.
>> 
>> The default behavior, "restart", stops all the instances and then
>> starts one instance where it should be. This is the safest approach
>> since some services become disrupted when multiple copies are started.
>> 
>> However if the user is confident that only the extra copies need to be
>> stopped, they can now set multiple‑active to "stop_unexpected". The
>> instance that is active where it is supposed to be will not be stopped,
>> but all other instances will be.
>> 
>> If any resources are ordered after the multiply active resource, those
>> other resources will still need to be fully restarted. This is because
>> any ordering constraint "start A then start B" implies "stop B then
>> stop A", so we can't stop the wrongly active instances of A until B is
>> stopped.
> 
> But in the case of multiple‑active=stop_unexpected "the correct" A does
> remain active. If any dependent resource needs to be restarted anyway, I
> miss the intended use case. What is the difference with default option
> (except it may be faster)?

We had a case where the state reported by probe was just wrong and an attempt
to stop cause a node fence (probably due to another bug).
In any case the fencing loop would continue while the correct resource was
started up correctly on another node.
Stopping the "good instance", too is not good, specifically when the process
repeats in a loop.
What we saw/see may be the result of multiple bugs, however...

Regards,
Ulrich


> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list