[ClusterLabs] issue during Pacemaker failover testing

Wed Aug 30 12:35:18 EDT 2023

> Hi All,
> >
> > I'm running Pacemaker on Centos7
> > Name        : pcs
> > Version     : 0.9.169
> > Release     : 3.el7.centos.3
> > Architecture: x86_64
> >
> >
> Besides the pcs-version versions of the other cluster-stack-components
> could be interesting. (pacemaker, corosync)
>
 rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents"
fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64
corosynclib-2.4.5-7.el7_9.2.x86_64
pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
fence-agents-common-4.2.1-41.el7_9.6.x86_64
corosync-2.4.5-7.el7_9.2.x86_64
pacemaker-cli-1.1.23-1.el7_9.1.x86_64
pacemaker-1.1.23-1.el7_9.1.x86_64
pcs-0.9.169-3.el7.centos.3.x86_64
pacemaker-libs-1.1.23-1.el7_9.1.x86_64

>
>
> > I'm performing some cluster failover tests in a 3 node cluster. We have 3
> > resources in the cluster.
> > I was trying to see if I could get it working if 2 nodes fail at
> different
> > times. I'd like the 3 resources to then run on one node.
> >
> > The quorum options I've configured are as follows
> > [root at node1 ~]# pcs quorum config
> > Options:
> >   auto_tie_breaker: 1
> >   last_man_standing: 1
> >   last_man_standing_window: 10000
> >   wait_for_all: 1
> >
> >
> Not sure if the combination of auto_tie_breaker and last_man_standing makes
> sense.
> And as you have a cluster with an odd number of nodes auto_tie_breaker
> should be
> disabled anyway I guess.
>
Ah ok I'll try removing auto_tie_breaker and leave last_man_standing

>
>
> > [root at node1 ~]# pcs quorum status
> > Quorum information
> > ------------------
> > Date:             Wed Aug 30 11:20:04 2023
> > Quorum provider:  corosync_votequorum
> > Nodes:            3
> > Node ID:          1
> > Ring ID:          1/1538
> > Quorate:          Yes
> >
> > Votequorum information
> > ----------------------
> > Expected votes:   3
> > Highest expected: 3
> > Total votes:      3
> > Quorum:           2
> > Flags:            Quorate WaitForAll LastManStanding AutoTieBreaker
> >
> > Membership information
> > ----------------------
> >     Nodeid      Votes    Qdevice Name
> >          1          1         NR node1 (local)
> >          2          1         NR node2
> >          3          1         NR node3
> >
> > If I stop the cluster services on node 2 and 3, the groups all failover
> to
> > node 1 since it is the node with the lowest ID
> > But if I stop them on node1 and node 2 or node1 and node3, the cluster
> > fails.
> >
> > I tried adding this line to corosync.conf and I could then bring down the
> > services on node 1 and 2 or node 2 and 3 but if I left node 2 until last,
> > the cluster failed
> > auto_tie_breaker_node: 1  3
> >
> > This line had the same outcome as using 1 3
> > auto_tie_breaker_node: 1  2 3
> >
> >
> Giving multiple auto_tie_breaker-nodes doesn't make sense to me but rather
> sounds dangerous if that configuration is possible at all.
>
> Maybe the misbehavior of last_man_standing is due to this (maybe not
> recognized) misconfiguration.
> Did you wait long enough between letting the 2 nodes fail?
>
I've done it so many times so I believe so. But I'll try remove the
auto_tie_breaker config, leaving the last_man_standing. I'll also make sure
I leave a couple of minutes between bringing down the nodes and post back.

>
> Klaus
>
>
> > So I'd like it to failover when any combination of two nodes fail but
> I've
> > only had success when the middle node isn't last.
> >
> > Thanks
> > David
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230830/b5cc166c/attachment.htm>