[ClusterLabs] issue during Pacemaker failover testing

Wed Aug 30 09:40:55 EDT 2023

On Wed, Aug 30, 2023 at 2:34 PM David Dolan <daithidolan at gmail.com> wrote:

> Hi All,
>
> I'm running Pacemaker on Centos7
> Name        : pcs
> Version     : 0.9.169
> Release     : 3.el7.centos.3
> Architecture: x86_64
>
>
Besides the pcs-version versions of the other cluster-stack-components
could be interesting. (pacemaker, corosync)

> I'm performing some cluster failover tests in a 3 node cluster. We have 3
> resources in the cluster.
> I was trying to see if I could get it working if 2 nodes fail at different
> times. I'd like the 3 resources to then run on one node.
>
> The quorum options I've configured are as follows
> [root at node1 ~]# pcs quorum config
> Options:
>   auto_tie_breaker: 1
>   last_man_standing: 1
>   last_man_standing_window: 10000
>   wait_for_all: 1
>
>
Not sure if the combination of auto_tie_breaker and last_man_standing makes
sense.
And as you have a cluster with an odd number of nodes auto_tie_breaker
should be
disabled anyway I guess.

> [root at node1 ~]# pcs quorum status
> Quorum information
> ------------------
> Date:             Wed Aug 30 11:20:04 2023
> Quorum provider:  corosync_votequorum
> Nodes:            3
> Node ID:          1
> Ring ID:          1/1538
> Quorate:          Yes
>
> Votequorum information
> ----------------------
> Expected votes:   3
> Highest expected: 3
> Total votes:      3
> Quorum:           2
> Flags:            Quorate WaitForAll LastManStanding AutoTieBreaker
>
> Membership information
> ----------------------
>     Nodeid      Votes    Qdevice Name
>          1          1         NR node1 (local)
>          2          1         NR node2
>          3          1         NR node3
>
> If I stop the cluster services on node 2 and 3, the groups all failover to
> node 1 since it is the node with the lowest ID
> But if I stop them on node1 and node 2 or node1 and node3, the cluster
> fails.
>
> I tried adding this line to corosync.conf and I could then bring down the
> services on node 1 and 2 or node 2 and 3 but if I left node 2 until last,
> the cluster failed
> auto_tie_breaker_node: 1  3
>
> This line had the same outcome as using 1 3
> auto_tie_breaker_node: 1  2 3
>
>
Giving multiple auto_tie_breaker-nodes doesn't make sense to me but rather
sounds dangerous if that configuration is possible at all.

Maybe the misbehavior of last_man_standing is due to this (maybe not
recognized) misconfiguration.
Did you wait long enough between letting the 2 nodes fail?

Klaus

> So I'd like it to failover when any combination of two nodes fail but I've
> only had success when the middle node isn't last.
>
> Thanks
> David
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230830/6adb488b/attachment-0001.htm>