[ClusterLabs] issue during Pacemaker failover testing
Klaus Wenninger
kwenning at redhat.com
Wed Aug 30 09:40:55 EDT 2023
On Wed, Aug 30, 2023 at 2:34 PM David Dolan <daithidolan at gmail.com> wrote:
> Hi All,
>
> I'm running Pacemaker on Centos7
> Name : pcs
> Version : 0.9.169
> Release : 3.el7.centos.3
> Architecture: x86_64
>
>
Besides the pcs-version versions of the other cluster-stack-components
could be interesting. (pacemaker, corosync)
> I'm performing some cluster failover tests in a 3 node cluster. We have 3
> resources in the cluster.
> I was trying to see if I could get it working if 2 nodes fail at different
> times. I'd like the 3 resources to then run on one node.
>
> The quorum options I've configured are as follows
> [root at node1 ~]# pcs quorum config
> Options:
> auto_tie_breaker: 1
> last_man_standing: 1
> last_man_standing_window: 10000
> wait_for_all: 1
>
>
Not sure if the combination of auto_tie_breaker and last_man_standing makes
sense.
And as you have a cluster with an odd number of nodes auto_tie_breaker
should be
disabled anyway I guess.
> [root at node1 ~]# pcs quorum status
> Quorum information
> ------------------
> Date: Wed Aug 30 11:20:04 2023
> Quorum provider: corosync_votequorum
> Nodes: 3
> Node ID: 1
> Ring ID: 1/1538
> Quorate: Yes
>
> Votequorum information
> ----------------------
> Expected votes: 3
> Highest expected: 3
> Total votes: 3
> Quorum: 2
> Flags: Quorate WaitForAll LastManStanding AutoTieBreaker
>
> Membership information
> ----------------------
> Nodeid Votes Qdevice Name
> 1 1 NR node1 (local)
> 2 1 NR node2
> 3 1 NR node3
>
> If I stop the cluster services on node 2 and 3, the groups all failover to
> node 1 since it is the node with the lowest ID
> But if I stop them on node1 and node 2 or node1 and node3, the cluster
> fails.
>
> I tried adding this line to corosync.conf and I could then bring down the
> services on node 1 and 2 or node 2 and 3 but if I left node 2 until last,
> the cluster failed
> auto_tie_breaker_node: 1 3
>
> This line had the same outcome as using 1 3
> auto_tie_breaker_node: 1 2 3
>
>
Giving multiple auto_tie_breaker-nodes doesn't make sense to me but rather
sounds dangerous if that configuration is possible at all.
Maybe the misbehavior of last_man_standing is due to this (maybe not
recognized) misconfiguration.
Did you wait long enough between letting the 2 nodes fail?
Klaus
> So I'd like it to failover when any combination of two nodes fail but I've
> only had success when the middle node isn't last.
>
> Thanks
> David
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230830/6adb488b/attachment-0001.htm>
More information about the Users
mailing list