<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 4, 2023 at 1:18 PM Klaus Wenninger <<a href="mailto:kwenning@redhat.com">kwenning@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 4, 2023 at 12:45 PM David Dolan <<a href="mailto:daithidolan@gmail.com" target="_blank">daithidolan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Klaus,<div><br></div><div>With default quorum options I've performed the following on my 3 node cluster<br><br></div><div>Bring down cluster services on one node - the running services migrate to another node<br>Wait 3 minutes<br>Bring down cluster services on one of the two remaining nodes - the surviving node in the cluster is then fenced<br><br>Instead of the surviving node being fenced, I hoped that the services would migrate and run on that remaining node.</div><div><br>Just looking for confirmation that my understanding is ok and if I'm missing something?<br></div></div></blockquote><div><br></div><div>As said I've never used it ...</div><div>Well when down to 2 nodes LMS per definition is getting into trouble as after another</div><div>outage any of them is gonna be alone. In case of an ordered shutdown this could</div><div>possibly be circumvented though. So I guess your fist attempt to enable auto-tie-breaker</div><div>was the right idea. Like this you will have further service at least on one of the nodes.</div><div>So I guess what you were seeing is the right - and unfortunately only possible - behavior.</div><div>Where LMS shines is probably scenarios with substantially more nodes. </div></div></div></blockquote><div><br></div><div>Or go for qdevice with LMS where I would expect it to be able to really go down to</div><div>a single node left - any of the 2 last ones - as there is still qdevice.#</div><div>Sry for the confusion btw.</div><div><br></div><div>Klaus </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>Klaus</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br>Thanks</div><div>David<br><br><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 31 Aug 2023 at 11:59, David Dolan <<a href="mailto:daithidolan@gmail.com" target="_blank">daithidolan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I just tried removing all the quorum options setting back to defaults so no last_man_standing or wait_for_all.<div>I still see the same behaviour where the third node is fenced if I bring down services on two nodes.<br>Thanks</div><div>David</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 31 Aug 2023 at 11:44, Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 31, 2023 at 12:28 PM David Dolan <<a href="mailto:daithidolan@gmail.com" target="_blank">daithidolan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 30 Aug 2023 at 17:35, David Dolan <<a href="mailto:daithidolan@gmail.com" target="_blank">daithidolan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> Hi All,<br>
><br>
> I'm running Pacemaker on Centos7<br>
> Name : pcs<br>
> Version : 0.9.169<br>
> Release : 3.el7.centos.3<br>
> Architecture: x86_64<br>
><br>
><br>
Besides the pcs-version versions of the other cluster-stack-components<br>
could be interesting. (pacemaker, corosync)<br></blockquote><div> rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents"</div>fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64<br>corosynclib-2.4.5-7.el7_9.2.x86_64<br>pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64<br>fence-agents-common-4.2.1-41.el7_9.6.x86_64<br>corosync-2.4.5-7.el7_9.2.x86_64<br>pacemaker-cli-1.1.23-1.el7_9.1.x86_64<br>pacemaker-1.1.23-1.el7_9.1.x86_64<br>pcs-0.9.169-3.el7.centos.3.x86_64<br>pacemaker-libs-1.1.23-1.el7_9.1.x86_64<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
> I'm performing some cluster failover tests in a 3 node cluster. We have 3<br>
> resources in the cluster.<br>
> I was trying to see if I could get it working if 2 nodes fail at different<br>
> times. I'd like the 3 resources to then run on one node.<br>
><br>
> The quorum options I've configured are as follows<br>
> [root@node1 ~]# pcs quorum config<br>
> Options:<br>
> auto_tie_breaker: 1<br>
> last_man_standing: 1<br>
> last_man_standing_window: 10000<br>
> wait_for_all: 1<br>
><br>
><br>
Not sure if the combination of auto_tie_breaker and last_man_standing makes<br>
sense.<br>
And as you have a cluster with an odd number of nodes auto_tie_breaker<br>
should be<br>
disabled anyway I guess.<br></blockquote><div>Ah ok I'll try removing auto_tie_breaker and leave last_man_standing </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
> [root@node1 ~]# pcs quorum status<br>
> Quorum information<br>
> ------------------<br>
> Date: Wed Aug 30 11:20:04 2023<br>
> Quorum provider: corosync_votequorum<br>
> Nodes: 3<br>
> Node ID: 1<br>
> Ring ID: 1/1538<br>
> Quorate: Yes<br>
><br>
> Votequorum information<br>
> ----------------------<br>
> Expected votes: 3<br>
> Highest expected: 3<br>
> Total votes: 3<br>
> Quorum: 2<br>
> Flags: Quorate WaitForAll LastManStanding AutoTieBreaker<br>
><br>
> Membership information<br>
> ----------------------<br>
> Nodeid Votes Qdevice Name<br>
> 1 1 NR node1 (local)<br>
> 2 1 NR node2<br>
> 3 1 NR node3<br>
><br>
> If I stop the cluster services on node 2 and 3, the groups all failover to<br>
> node 1 since it is the node with the lowest ID<br>
> But if I stop them on node1 and node 2 or node1 and node3, the cluster<br>
> fails.<br>
><br>
> I tried adding this line to corosync.conf and I could then bring down the<br>
> services on node 1 and 2 or node 2 and 3 but if I left node 2 until last,<br>
> the cluster failed<br>
> auto_tie_breaker_node: 1 3<br>
><br>
> This line had the same outcome as using 1 3<br>
> auto_tie_breaker_node: 1 2 3<br>
><br>
><br>
Giving multiple auto_tie_breaker-nodes doesn't make sense to me but rather<br>
sounds dangerous if that configuration is possible at all.<br>
<br>
Maybe the misbehavior of last_man_standing is due to this (maybe not<br>
recognized) misconfiguration.<br>
Did you wait long enough between letting the 2 nodes fail?<br></blockquote><div>I've done it so many times so I believe so. But I'll try remove the auto_tie_breaker config, leaving the last_man_standing. I'll also make sure I leave a couple of minutes between bringing down the nodes and post back.</div></div></div></blockquote><div>Just confirming I removed the auto_tie_breaker config and tested. Quorum configuration is as follows:</div><div> Options:</div> last_man_standing: 1<br> last_man_standing_window: 10000<br> wait_for_all: 1<br><br>I waited 2-3 minutes between stopping cluster services on two nodes via pcs cluster stop<br>The remaining cluster node is then fenced. I was hoping the remaining node would stay online running the resources.<br></div></div></blockquote><div><br></div><div>Yep - that would've been my understanding as well.</div><div>But honestly I've never used last_man_standing in this context - wasn't even aware that it was</div><div>offered without qdevice nor have I checked how it is implemented.</div><div><br></div><div>Klaus </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Klaus<br>
<br>
<br>
> So I'd like it to failover when any combination of two nodes fail but I've<br>
> only had success when the middle node isn't last.<br>
><br>
> Thanks<br>
> David<br><br>
<br><br>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div></div>