<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 10, 2024 at 9:52 PM Angelo Ruggiero <<a href="mailto:angeloruggiero@yahoo.com">angeloruggiero@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-93499451217259133">
<div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_-93499451217259133appendonsend" style="color:inherit"></div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<hr style="display:inline-block;width:98%">
<div id="m_-93499451217259133divRplyFwdMsg" dir="ltr" style="color:inherit"><span style="font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)"><b>From:</b> Klaus Wenninger <<a href="mailto:kwenning@redhat.com" target="_blank">kwenning@redhat.com</a>><br>
<b>Sent:</b> 10 October 2024 4:52 PM<br>
<b>To:</b> Cluster Labs - All topics related to open-source clustering welcomed <<a href="mailto:users@clusterlabs.org" target="_blank">users@clusterlabs.org</a>><br>
<b>Cc:</b> Angelo Ruggiero <<a href="mailto:angeloruggiero@yahoo.com" target="_blank">angeloruggiero@yahoo.com</a>><br>
<b>Subject:</b> Re: [ClusterLabs] Users Digest, Vol 117, Issue 5</span>
<div> </div>
</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">On Thu, Oct 10, 2024 at 3:58 PM Angelo Ruggiero via Users <<a href="mailto:users@clusterlabs.org" id="m_-93499451217259133OWA33df6e66-e5e3-60e2-e76a-90128766b018" target="_blank">users@clusterlabs.org</a>> wrote:</div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Thanks for answering. It helps.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>Main scenario where poison pill shines is 2-node-clusters where you don't<br>
>have usable quorum for watchdog-fencing.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Not sure i understand. As if just 2 node and one node fails it cannot respond to the poision pilll. Maybe i mis your point.</div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">If in a 2 node setup one node loses contact to the other or sees some other reason why it would like</div>
<div style="direction:ltr">the partner-node to be fenced it will try to write the poison-pill message to the shared disk and if that</div>
<div style="direction:ltr">goes Ok and after a configured wait time for the other node to read the message, respond or the</div>
<div style="direction:ltr">watchdog to kick in it will assume the other node to be fenced. </div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
AR: Yes, understood. </div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
AR: I guess i am looking for the killer requirement for my setup that say for 2 node cluster with an usable quorum device (usable to be defined later). Does poison pill via SBD or even fence_vnware give me anything. I am struggling to find a scenario. See
my final comment in this reply on monitoring below.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
This also begs the followup question, what defines "usable quroum". Do you mean for example on seperate independent network hardware and power supply?</div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">Quorum in 2 node clusters is a bit different as they will stay quorate when losing connection. To prevent split brain there if they</div>
<div style="direction:ltr">reboot on top they will just regain quorum once they've seen each other (search for 'wait-for-all' to read more).</div>
<div style="direction:ltr">This behavior is of course not usable for watchdog-fencing and thus SBD automatically switches to not relying on quorum in</div>
<div style="direction:ltr">those 2-node setups.</div>
<div style="direction:ltr"> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>Configured with pacemaker-awareness - default - availability of the shared-disk doesn't become an issue as, due to fallback to availability of the 2nd node, the disk is >no spof (single point of failure) in these clusters.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
I did not get the jist of what you are trying to say here. 🙂<br>
<br>
</div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">I was suggesting a scenario that has 2 cluster nodes + a single shared disk. With kind of 'pure' SBD this would mean that a node</div>
<div style="direction:ltr">that is losing connection to the disk would have to self fence which would mean that this disk would become a so called</div>
<div style="direction:ltr">single-point-of-failure - meaning that available of resources in the cluster would be reduced to availability of this single disk.</div>
<div style="direction:ltr">So I tried to explain why you don't have to fear this reduction of availability using pacemaker-awareness.</div>
<div style="direction:ltr"> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>Other nodes btw. can still kill a node with watchdog-fencing. I</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
How does that work when would the killing node tell the other node not to keep triggering its watchdog? </div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Having written the above sentence maybe it should go and read up when does the poison pill get sent by the killing node!</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
</blockquote>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">It would either use cluster-communication to tell the node to self-fence and if that isn't available the case</div>
<div style="direction:ltr">below kicks in.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
AR: ok</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>Quorum in 2 node clusters is a bit different as they will stay quorate when losing connection. </div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
AR: here you refer to 2 node cluster without a quorum device right?</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
AR: futhermore are you saying that poison pill and maybe even node fencing from the cluster is not needed when you do not have a quroum device for 2 node clusters.</div></div></div></blockquote><div><br></div><div>No that is a misunderstanding. For all I described some sort of SBD setup is needed.</div><div>And yes - when I was talking about 2-node-clusters I meant those without a quorum device - those which have</div><div>the 2-node config set in the corosync-config-file.</div><div>I was just saying that without quorum device (or of course 3 and up full cluster nodes) you can't use watchdog-fencing.</div><div>What you still can use is poison-pill fencing if you want to go for SBD. If it is viable for you considering other aspects</div><div>like credentials or accessibility over the network I guess it is alway worth while looking into fencing via the hypervisor.</div><div>There are definitely benefits in getting a response from the hypervisor that a node is down instead of having to wait</div><div>some time - including some safety addon - for it to self-fence. There are as well benefits if pacemaker can explicitly</div><div>turn a node off and on afterwards instead of triggering a reboot (out of obvious reasons the only way it works with SBD).</div><div>If working with hypervisors using their maintenance features (pausing, migration, ...) together with their virtual watchdog</div><div>implementation or softdog you as well have to consider situations where the watchdog timeout might not happen </div><div>reliably within the specified timeout. </div><div><br></div><div>Regards,</div><div>Klaus</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-93499451217259133"><div dir="ltr">
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Hope that makes things a bit clearer.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
AR: always 🙂 such discussions are hard in both ways to be clear.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
AR: As mentioned in an earlier reply. I think I need to dwell on what failure cases i could have and i should go and research the monitoring the resource agents i intened to use offer</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
I.e IPAddr2, FileSytems and the SAP instance agents as i guess they are the ones that would decide to fence another node. The general case where nodes cannot communicate via the network is builtin.</div>
<div style="direction:ltr"><br>
</div>
<div style="direction:ltr">Regards,</div>
<div style="direction:ltr">Klaus</div>
<div style="direction:ltr"> </div>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)">
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>If the node isn't able to accept that wish of another</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
>node for it to die it will have lost quorum, have stopped triggering the watchdog anyway.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Yes that is clear to mean the self-fencing is quite powerful.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
Thanks for the response.</div>
<div style="direction:ltr;font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_-93499451217259133x_m_5594183960408677331appendonsend" style="color:inherit"></div>
<hr style="direction:ltr;display:inline-block;width:98%">
<div style="direction:ltr;font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<b>From:</b> Users <<a href="mailto:users-bounces@clusterlabs.org" id="m_-93499451217259133OWA26b9aa6f-b143-660c-7a35-b76219f908c3" target="_blank">users-bounces@clusterlabs.org</a>> on behalf of
<a href="mailto:users-request@clusterlabs.org" id="m_-93499451217259133OWA24d3cd99-870c-084a-b4f4-be15d668231c" target="_blank">
users-request@clusterlabs.org</a> <<a href="mailto:users-request@clusterlabs.org" id="m_-93499451217259133OWAac8b5c0e-480a-e4e5-704f-791de55dc80b" target="_blank">users-request@clusterlabs.org</a>><br>
<b>Sent:</b> 10 October 2024 2:00 PM</div>
<div id="m_-93499451217259133x_m_5594183960408677331divRplyFwdMsg" dir="ltr" style="color:inherit">
<span style="font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)"><b>To:</b>
<a href="mailto:users@clusterlabs.org" id="m_-93499451217259133OWA1d9def1b-9cb7-d60a-fd37-659127986d85" target="_blank">
users@clusterlabs.org</a> <<a href="mailto:users@clusterlabs.org" id="m_-93499451217259133OWA7f2ad36f-2f38-38ca-a598-b2fedd0bf436" target="_blank">users@clusterlabs.org</a>><br>
</span></div>
<div style="direction:ltr;font-family:Calibri,sans-serif;font-size:11pt;color:rgb(0,0,0)">
<b>Subject:</b> Users Digest, Vol 117, Issue 5</div>
<div style="direction:ltr"> </div>
<div style="direction:ltr;font-size:11pt">Send Users mailing list submissions to<br>
<a href="mailto:users@clusterlabs.org" id="m_-93499451217259133OWA21082914-0d62-4737-8f2f-d7afd73e7dd7" target="_blank">
users@clusterlabs.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" id="m_-93499451217259133OWA520a4791-4411-b039-1cea-0fe96eb23354" target="_blank">
https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:users-request@clusterlabs.org" id="m_-93499451217259133OWA1a71fc50-675a-69c4-6dc9-f6a9e79e1250" target="_blank">
users-request@clusterlabs.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:users-owner@clusterlabs.org" id="m_-93499451217259133OWA34b927aa-2f0d-e5ce-e1ad-dddcf03fa085" target="_blank">
users-owner@clusterlabs.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Users digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: Fencing Approach (Klaus Wenninger)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Wed, 9 Oct 2024 19:03:09 +0200<br>
From: Klaus Wenninger <<a href="mailto:kwenning@redhat.com" id="m_-93499451217259133OWA3b113955-c287-62e4-61f5-80bc97a2d453" target="_blank">kwenning@redhat.com</a>><br>
To: Cluster Labs - All topics related to open-source clustering<br>
welcomed <<a href="mailto:users@clusterlabs.org" id="m_-93499451217259133OWA65b68517-6c4b-17bb-0f9a-937050214c80" target="_blank">users@clusterlabs.org</a>><br>
Cc: Angelo Ruggiero <<a href="mailto:angeloruggiero@yahoo.com" id="m_-93499451217259133OWA570886a9-7fa6-6a98-477b-8b590a2817ac" target="_blank">angeloruggiero@yahoo.com</a>><br>
Subject: Re: [ClusterLabs] Fencing Approach<br>
Message-ID:<br>
<CALrDAo332oqdTEMX82nc-BPWJ=<a href="mailto:Ea4n_citY71HLamSOv3Kw-cA@mail.gmail.com" id="m_-93499451217259133OWA1dcd6ce9-9807-c2cb-e659-e2c5edcffcdc" target="_blank">Ea4n_citY71HLamSOv3Kw-cA@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
On Wed, Oct 9, 2024 at 3:08?PM Angelo Ruggiero via Users <<br>
<a href="mailto:users@clusterlabs.org" id="m_-93499451217259133OWA5fbae908-6bf8-a35f-9175-9f1d86100417" target="_blank">users@clusterlabs.org</a>> wrote:<br>
<br>
> Hello,<br>
><br>
> My setup....<br>
><br>
><br>
> - We are setting up a pacemaker cluster to run SAP runnig on RHEL on<br>
> Vmware virtual machines.<br>
> - We will have two nodes for the application server of SAP and 2 nodes<br>
> for the Hana database. SAP/RHEL provide good support on how to setup the<br>
> cluster. ?<br>
> - SAP will need a number of floating Ips to be moved around as well<br>
> mountin/unmounted NFS file system coming from a NetApp device. SAP will<br>
> need processes switching on and off when something happens planned or<br>
> unplanned.I am not clear if the netapp devic is active and the other site<br>
> is DR but what i know is the ip addresses just get moved during a DR<br>
> incident. Just to be complete the HANA data sync is done by HANA itself<br>
> most probably async with an RPO of 15mins or so.<br>
> - We will have a quorum node also with hopefully a seperate network<br>
> not sure if it will be on a seperate vmware infra though.<br>
> - I am hoping to be allowed to use the vmware watchdog although it<br>
> might take some persuading as declared as "non standard" for us by our<br>
> infra people. I have it already in DEV to play with now.<br>
><br>
> I managed to set the above working just using a floating ip and a nfs<br>
> mount as my resources and I can see the following. The self fencing<br>
> approach works fine i.e the servers reboot when they loose network<br>
> connectivity and/or become in quorate as long as they are offering<br>
> resources.<br>
><br>
> So my questions are in relation to further fencing .... I did a lot of<br>
> reading and saw various reference...<br>
><br>
><br>
> 1. Use of sbd shared storage<br>
><br>
> The question is what does using sbd with a shated storage really give me.<br>
> I need to justify why i need this shared storage again to the infra guys<br>
> but to be honest also to myself. I have been given this infra and will<br>
> play with it next few days.<br>
><br>
><br>
> 2. Use of fence vmware<br>
><br>
> In addition there is the ability of course to fence using the fence_vmware<br>
> agents and I again I need to justify why i need this. In this particular<br>
> cases it will be a very hard sell because the dev/test and prod<br>
> environments run on the same vmware infra so to use fence_vmware would<br>
> effectively mean dev is connected to prod i.e the user id for a dev or test<br>
> box is being provided by a production environment. I do not have this<br>
> ability at all so cannot play with it.<br>
><br>
><br>
><br>
> My current thought train...i.e the typical things i think about...<br>
><br>
> Perhaps someone can help me be clear on the benefits of 1 and 2 over and<br>
> above the setup i think it doable.<br>
><br>
><br>
> 1. gives me the ability to use poison pill<br>
><br>
> But what scenarios does poison pill really help why would the other<br>
> parts of the cluster want to fence the node if the node itself has not<br>
> killed it self as it lost quorum either because quorum devcice gone or<br>
> network connectivity failed and resources needs to be switched off.<br>
><br>
> What i get is that it is very explict i.e the others nodes<br>
> tell the other server to die. So it must be a case initiated by the other<br>
> nodes.<br>
> I am struggling to think of a scenarios where the other<br>
> nodes would want to fence it.<br>
><br>
<br>
Main scenario where poison pill shines is 2-node-clusters where you don't<br>
have usable quorum for watchdog-fencing.<br>
Configured with pacemaker-awareness - default - availability of the<br>
shared-disk doesn't become an issue as, due to<br>
fallback to availability of the 2nd node, the disk is no spof (single<br>
point of failure) in these clusters.<br>
Other nodes btw. can still kill a node with watchdog-fencing. If the node<br>
isn't able to accept that wish of another<br>
node for it to die it will have lost quorum, have stopped triggering the<br>
watchdog anyway.<br>
<br>
Regards,<br>
Klaus<br>
<br>
><br>
> Possible Scenarios, did i miss any?<br>
><br>
> - Loss of network connection to the node. But that is covered by the<br>
> node self fencing<br>
> - If some monitoring said the node was not healthly or responding...<br>
> Maybe this is the case it is good for but then it must be a partial failure<br>
> where the node is still part fof the cluster and can respond. I.e not OS<br>
> freeze or only it looses connection as then the watchdog or the self<br>
> fencing will kick in.<br>
> - HW failures, cpu, memory, disk For virtual hardware does that<br>
> actually ever fail? Sorry if stupid question. I could ask our infra guys<br>
> but....,<br>
> So is virtual hardware so reliable that hw failures can be ignored.<br>
> - Loss of shared storage SAP uses a lot of shared storage via NFS. Not<br>
> sure what happens when that fails need to research it a bit but each node<br>
> will sort that out itself I am presuming.<br>
> - Human error: but no cluster will fix that and the human who makes a<br>
> change will realise it and revert. ?<br>
><br>
> 2. Fence vmware<br>
><br>
> I see this as a better poision pill as it works at the hardware<br>
> level. But if I do not need poision pill then i do not need this.<br>
><br>
> In general OS freezes or even panics if take took long are covered by the<br>
> watchdog.<br>
><br>
> regards<br>
> Angelo<br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Manage your subscription:<br>
> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" id="m_-93499451217259133OWA60424ef8-b9ca-82e7-02c5-060aa411ea8c" target="_blank">
https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> ClusterLabs home: <a href="https://www.clusterlabs.org/" id="m_-93499451217259133OWAbb6d3522-ca22-e084-be63-2d934cb6f5d4" target="_blank">
https://www.clusterlabs.org/</a><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="https://lists.clusterlabs.org/pipermail/users/attachments/20241009/b9a58eb1/attachment-0001.htm" id="m_-93499451217259133OWA7451767b-4637-48ca-1413-675b9a761dd0" target="_blank">https://lists.clusterlabs.org/pipermail/users/attachments/20241009/b9a58eb1/attachment-0001.htm</a>><br>
<br>
------------------------------<br>
<br>
Subject: Digest Footer<br>
<br>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" id="m_-93499451217259133OWAff7d9da0-87e6-76a6-b1d3-0f42b71c0722" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" id="m_-93499451217259133OWA8455ab2c-dd83-1fd0-275d-7496f14b1986" target="_blank">
https://www.clusterlabs.org/</a><br>
<br>
<br>
------------------------------<br>
<br>
End of Users Digest, Vol 117, Issue 5<br>
*************************************</div>
<div style="direction:ltr">_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" id="m_-93499451217259133OWA522b79a0-0e40-a045-2438-f00eb4508625" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" id="m_-93499451217259133OWA3b2314d7-c5d7-99eb-99c8-384d141dd969" target="_blank">
https://www.clusterlabs.org/</a></div>
</blockquote>
</div>
</div></blockquote></div></div>