[ClusterLabs] Antw: Gracefully stop nodes one by one with disk-less sbd

Mon Aug 12 02:42:10 EDT 2019

Hi!

One motivation to stop all nodes at the same time is to avoid needless moving
of resources, like the following:
You stop node A, then resources are stopped on A and started elsewhere
You stop node B, and resources are stopped and moved to remaining nodes
...until the last node stops, or quorum prevents cluster operation (effect
depends on further settings)

Unfortunately (AFAIK) there's not command to "stop the cluster" yet.
A "stop cluster" command would stop all resources on all nodes, then stop the
nodes (and lower layers) in a way that there is no "quorum lost" or fencing
going on.

Regards,
Ulrich

>>> Yan Gao <YGao at suse.com> schrieb am 09.08.2019 um 15:34 in Nachricht
<d1713b62-24fb-cb1d-28e4-0623fea189b4 at suse.com>:
> Hi,
> 
> With disk‑less sbd,  it's fine to stop cluster service from the cluster 
> nodes all at the same time.
> 
> But if to stop the nodes one by one, for example with a 3‑node cluster, 
> after stopping the 2nd node, the only remaining node resets itself with:
> 
> Aug 09 14:30:20 opensuse150‑1 sbd[1079]:       pcmk:    debug: 
> notify_parent: Not notifying parent: state transient (2)
> Aug 09 14:30:20 opensuse150‑1 sbd[1080]:    cluster:    debug: 
> notify_parent: Notifying parent: healthy
> Aug 09 14:30:20 opensuse150‑1 sbd[1078]:  warning: inquisitor_child: 
> Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
> 
> I can think of the way to manipulate quorum with last_man_standing and 
> potentially also auto_tie_breaker, not to mention 
> last_man_standing_window would also be a factor... But is there a better 
> solution?
> 
> Thanks,
>    Yan
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/