<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 10, 2024 at 3:58 PM Angelo Ruggiero via Users <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg5594183960408677331">
<div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Thanks for answering. It helps.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>Main scenario where poison pill shines is 2-node-clusters where you don't<br>
>have usable quorum for watchdog-fencing.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Not sure i understand. As if just 2 node and one node fails it cannot respond to the poision pilll. Maybe i mis your point.</div></div></div></blockquote><div><br></div><div>If in a 2 node setup one node loses contact to the other or sees some other reason why it would like</div><div>the partner-node to be fenced it will try to write the poison-pill message to the shared disk and if that</div><div>goes Ok and after a configured wait time for the other node to read the message, respond or the</div><div>watchdog to kick in it will assume the other node to be fenced. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg5594183960408677331"><div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
This also begs the followup question, what defines "usable quroum". Do you mean for example on seperate independent network hardware and power supply?</div></div></div></blockquote><div><br></div><div>Quorum in 2 node clusters is a bit different as they will stay quorate when losing connection. To prevent split brain there if they</div><div>reboot on top they will just regain quorum once they've seen each other (search for 'wait-for-all' to read more).</div><div>This behavior is of course not usable for watchdog-fencing and thus SBD automatically switches to not relying on quorum in</div><div>those 2-node setups.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg5594183960408677331"><div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>Configured with pacemaker-awareness - default - availability of the shared-disk doesn't become an issue as, due to fallback to availability of the 2nd node, the disk is >no spof (single point of failure) in these clusters.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
I did not get the jist of what you are trying to say here. 🙂<br>
<br></div></div></div></blockquote><div><br></div><div>I was suggesting a scenario that has 2 cluster nodes + a single shared disk. With kind of 'pure' SBD this would mean that a node</div><div>that is losing connection to the disk would have to self fence which would mean that this disk would become a so called</div><div>single-point-of-failure - meaning that available of resources in the cluster would be reduced to availability of this single disk.</div><div>So I tried to explain why you don't have to fear this reduction of availability using pacemaker-awareness.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg5594183960408677331"><div dir="ltr"><div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>Other nodes btw. can still kill a node with watchdog-fencing. I</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
How does that work when would the killing node tell the other node not to keep triggering its watchdog? </div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Having written the above sentence maybe it should go and read up when does the poison pill get sent by the killing node!</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br></div></div></div></blockquote><div><br></div><div>It would either use cluster-communication to tell the node to self-fence and if that isn't available the case</div><div>below kicks in.</div><div><br></div><div>Hope that makes things a bit clearer.</div><div><br></div><div>Regards,</div><div>Klaus</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg5594183960408677331"><div dir="ltr"><div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>If the node isn't able to accept that wish of another</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>node for it to die it will have lost quorum, have stopped triggering the watchdog anyway.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Yes that is clear to mean the self-fencing is quite powerful.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Thanks for the response.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_5594183960408677331appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_5594183960408677331divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Users <<a href="mailto:users-bounces@clusterlabs.org" target="_blank">users-bounces@clusterlabs.org</a>> on behalf of <a href="mailto:users-request@clusterlabs.org" target="_blank">users-request@clusterlabs.org</a> <<a href="mailto:users-request@clusterlabs.org" target="_blank">users-request@clusterlabs.org</a>><br>
<b>Sent:</b> 10 October 2024 2:00 PM<br>
<b>To:</b> <a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a> <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
<b>Subject:</b> Users Digest, Vol 117, Issue 5</font>
<div> </div>
</div>
<div><font size="2"><span style="font-size:11pt">
<div>Send Users mailing list submissions to<br>
<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:users-request@clusterlabs.org" target="_blank">users-request@clusterlabs.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:users-owner@clusterlabs.org" target="_blank">users-owner@clusterlabs.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Users digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: Fencing Approach (Klaus Wenninger)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Wed, 9 Oct 2024 19:03:09 +0200<br>
From: Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>><br>
To: Cluster Labs - All topics related to open-source clustering<br>
welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
Cc: Angelo Ruggiero <<a href="mailto:angeloruggiero@yahoo.com" target="_blank">angeloruggiero@yahoo.com</a>><br>
Subject: Re: [ClusterLabs] Fencing Approach<br>
Message-ID:<br>
<CALrDAo332oqdTEMX82nc-BPWJ=<a href="mailto:Ea4n_citY71HLamSOv3Kw-cA@mail.gmail.com" target="_blank">Ea4n_citY71HLamSOv3Kw-cA@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
On Wed, Oct 9, 2024 at 3:08?PM Angelo Ruggiero via Users <<br>
<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>> wrote:<br>
<br>
> Hello,<br>
><br>
> My setup....<br>
><br>
><br>
> - We are setting up a pacemaker cluster to run SAP runnig on RHEL on<br>
> Vmware virtual machines.<br>
> - We will have two nodes for the application server of SAP and 2 nodes<br>
> for the Hana database. SAP/RHEL provide good support on how to setup the<br>
> cluster. ?<br>
> - SAP will need a number of floating Ips to be moved around as well<br>
> mountin/unmounted NFS file system coming from a NetApp device. SAP will<br>
> need processes switching on and off when something happens planned or<br>
> unplanned.I am not clear if the netapp devic is active and the other site<br>
> is DR but what i know is the ip addresses just get moved during a DR<br>
> incident. Just to be complete the HANA data sync is done by HANA itself<br>
> most probably async with an RPO of 15mins or so.<br>
> - We will have a quorum node also with hopefully a seperate network<br>
> not sure if it will be on a seperate vmware infra though.<br>
> - I am hoping to be allowed to use the vmware watchdog although it<br>
> might take some persuading as declared as "non standard" for us by our<br>
> infra people. I have it already in DEV to play with now.<br>
><br>
> I managed to set the above working just using a floating ip and a nfs<br>
> mount as my resources and I can see the following. The self fencing<br>
> approach works fine i.e the servers reboot when they loose network<br>
> connectivity and/or become in quorate as long as they are offering<br>
> resources.<br>
><br>
> So my questions are in relation to further fencing .... I did a lot of<br>
> reading and saw various reference...<br>
><br>
><br>
> 1. Use of sbd shared storage<br>
><br>
> The question is what does using sbd with a shated storage really give me.<br>
> I need to justify why i need this shared storage again to the infra guys<br>
> but to be honest also to myself. I have been given this infra and will<br>
> play with it next few days.<br>
><br>
><br>
> 2. Use of fence vmware<br>
><br>
> In addition there is the ability of course to fence using the fence_vmware<br>
> agents and I again I need to justify why i need this. In this particular<br>
> cases it will be a very hard sell because the dev/test and prod<br>
> environments run on the same vmware infra so to use fence_vmware would<br>
> effectively mean dev is connected to prod i.e the user id for a dev or test<br>
> box is being provided by a production environment. I do not have this<br>
> ability at all so cannot play with it.<br>
><br>
><br>
><br>
> My current thought train...i.e the typical things i think about...<br>
><br>
> Perhaps someone can help me be clear on the benefits of 1 and 2 over and<br>
> above the setup i think it doable.<br>
><br>
><br>
> 1. gives me the ability to use poison pill<br>
><br>
> But what scenarios does poison pill really help why would the other<br>
> parts of the cluster want to fence the node if the node itself has not<br>
> killed it self as it lost quorum either because quorum devcice gone or<br>
> network connectivity failed and resources needs to be switched off.<br>
><br>
> What i get is that it is very explict i.e the others nodes<br>
> tell the other server to die. So it must be a case initiated by the other<br>
> nodes.<br>
> I am struggling to think of a scenarios where the other<br>
> nodes would want to fence it.<br>
><br>
<br>
Main scenario where poison pill shines is 2-node-clusters where you don't<br>
have usable quorum for watchdog-fencing.<br>
Configured with pacemaker-awareness - default - availability of the<br>
shared-disk doesn't become an issue as, due to<br>
fallback to availability of the 2nd node, the disk is no spof (single<br>
point of failure) in these clusters.<br>
Other nodes btw. can still kill a node with watchdog-fencing. If the node<br>
isn't able to accept that wish of another<br>
node for it to die it will have lost quorum, have stopped triggering the<br>
watchdog anyway.<br>
<br>
Regards,<br>
Klaus<br>
<br>
><br>
> Possible Scenarios, did i miss any?<br>
><br>
> - Loss of network connection to the node. But that is covered by the<br>
> node self fencing<br>
> - If some monitoring said the node was not healthly or responding...<br>
> Maybe this is the case it is good for but then it must be a partial failure<br>
> where the node is still part fof the cluster and can respond. I.e not OS<br>
> freeze or only it looses connection as then the watchdog or the self<br>
> fencing will kick in.<br>
> - HW failures, cpu, memory, disk For virtual hardware does that<br>
> actually ever fail? Sorry if stupid question. I could ask our infra guys<br>
> but....,<br>
> So is virtual hardware so reliable that hw failures can be ignored.<br>
> - Loss of shared storage SAP uses a lot of shared storage via NFS. Not<br>
> sure what happens when that fails need to research it a bit but each node<br>
> will sort that out itself I am presuming.<br>
> - Human error: but no cluster will fix that and the human who makes a<br>
> change will realise it and revert. ?<br>
><br>
> 2. Fence vmware<br>
><br>
> I see this as a better poision pill as it works at the hardware<br>
> level. But if I do not need poision pill then i do not need this.<br>
><br>
> In general OS freezes or even panics if take took long are covered by the<br>
> watchdog.<br>
><br>
> regards<br>
> Angelo<br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Manage your subscription:<br>
> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> ClusterLabs home: <a href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="https://lists.clusterlabs.org/pipermail/users/attachments/20241009/b9a58eb1/attachment-0001.htm" target="_blank">https://lists.clusterlabs.org/pipermail/users/attachments/20241009/b9a58eb1/attachment-0001.htm</a>><br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a><br>
<br>
<br>
------------------------------<br>
<br>
End of Users Digest, Vol 117, Issue 5<br>
*************************************<br>
</div>
</span></font></div>
</div>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
</div></blockquote></div></div>