<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 29, 2020 at 10:45 PM Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You got plenty of options:<br>

-  IPMI based  fencing like  HP iLO,  DELL iDRAC<br>

-  SCSI-3  persistent reservations (which can be extended to fence  the node when the reservation(s)  were  removed)<br>

<br>

- Shared  disk (even iSCSI)  and using SBD (a.k.a. Poison pill) -> in case your hardware has  no watchdog,  you can use  softdog  kernel module  for linux.<br></blockquote><div><br></div><div>Although softdog may not be reliable in all circumstances.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Best  Regards,<br>

Strahil Nikolov<br>

<br>

На 29 юли 2020 г. 9:01:22 GMT+03:00, Gabriele Bulfon <<a href="mailto:gbulfon@sonicle.com" target="_blank">gbulfon@sonicle.com</a>> написа:<br>

>That one was taken from a specific implementation on Solaris 11.<br>

>The situation is a dual node server with shared storage controller:<br>

>both nodes see the same disks concurrently.<br>

>Here we must be sure that the two nodes are not going to import/mount<br>

>the same zpool at the same time, or we will encounter data corruption:<br>

>node 1 will be perferred for pool 1, node 2 for pool 2, only in case<br>

>one of the node goes down or is taken offline the resources should be<br>

>first free by the leaving node and taken by the other node.<br>

> <br>

>Would you suggest one of the available stonith in this case?<br>

> <br>

>Thanks!<br>

>Gabriele<br>

> <br>

> <br>

> <br>

>Sonicle S.r.l. <br>

>: <br>

><a href="http://www.sonicle.com" rel="noreferrer" target="_blank">http://www.sonicle.com</a><br>

>Music: <br>

><a href="http://www.gabrielebulfon.com" rel="noreferrer" target="_blank">http://www.gabrielebulfon.com</a><br>

>Quantum Mechanics : <br>

><a href="http://www.cdbaby.com/cd/gabrielebulfon" rel="noreferrer" target="_blank">http://www.cdbaby.com/cd/gabrielebulfon</a><br>

>----------------------------------------------------------------------------------<br>

>Da: Strahil Nikolov<br>

>A: Cluster Labs - All topics related to open-source clustering welcomed<br>

>Gabriele Bulfon<br>

>Data: 29 luglio 2020 6.39.08 CEST<br>

>Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing<br>

>Do you have a reason not to use any stonith already available ?<br>

>Best Regards,<br>

>Strahil Nikolov<br>

>На 28 юли 2020 г. 13:26:52 GMT+03:00, Gabriele Bulfon<br>

>написа:<br>

>Thanks, I attach here the script.<br>

>It basically runs ssh on the other node with no password (must be<br>

>preconfigured via authorization keys) with commands.<br>

>This was taken from a script by OpenIndiana (I think).<br>

>As it stated in the comments, we don't want to halt or boot via ssh,<br>

>only reboot.<br>

>Maybe this is the problem, we should at least have it shutdown when<br>

>asked for.<br>

> <br>

>Actually if I stop corosync in node 2, I don't want it to shutdown the<br>

>system but just let node 1 keep control of all resources.<br>

>Same if I just shutdown manually node 2, <br>

>node 1 should keep control of all resources and release them back on<br>

>reboot.<br>

>Instead, when I stopped corosync on node 2, log was showing the<br>

>temptative to stonith node 2: why?<br>

> <br>

>Thanks!<br>

>Gabriele<br>

> <br>

> <br>

> <br>

>Sonicle S.r.l. <br>

>: <br>

><a href="http://www.sonicle.com" rel="noreferrer" target="_blank">http://www.sonicle.com</a><br>

>Music: <br>

><a href="http://www.gabrielebulfon.com" rel="noreferrer" target="_blank">http://www.gabrielebulfon.com</a><br>

>Quantum Mechanics : <br>

><a href="http://www.cdbaby.com/cd/gabrielebulfon" rel="noreferrer" target="_blank">http://www.cdbaby.com/cd/gabrielebulfon</a><br>

>Da:<br>

>Reid Wahl<br>

>A:<br>

>Cluster Labs - All topics related to open-source clustering welcomed<br>

>Data:<br>

>28 luglio 2020 12.03.46 CEST<br>

>Oggetto:<br>

>Re: [ClusterLabs] Antw: [EXT] Stonith failing<br>

>Gabriele,<br>

> <br>

>"No route to host" is a somewhat generic error message when we can't<br>

>find anyone to fence the node. It doesn't mean there's necessarily a<br>

>network routing issue at fault; no need to focus on that error message.<br>

> <br>

>I agree with Ulrich about needing to know what the script does. But<br>

>based on your initial message, it sounds like your custom fence agent<br>

>returns 1 in response to "on" and "off" actions. Am I understanding<br>

>correctly? If so, why does it behave that way? Pacemaker is trying to<br>

>run a poweroff action based on the logs, so it needs your script to<br>

>support an off action.<br>

>On Tue, Jul 28, 2020 at 2:47 AM Ulrich Windl<br>

><a href="mailto:Ulrich.Windl@rz.uni-regensburg.de" target="_blank">Ulrich.Windl@rz.uni-regensburg.de</a><br>

>wrote:<br>

>Gabriele Bulfon<br>

><a href="mailto:gbulfon@sonicle.com" target="_blank">gbulfon@sonicle.com</a><br>

>schrieb am 28.07.2020 um 10:56 in<br>

>Nachricht<br>

>:<br>

>Hi, now I have my two nodes (xstha1 and xstha2) with IPs configured by<br>

>Corosync.<br>

>To check how stonith would work, I turned off Corosync service on<br>

>second<br>

>node.<br>

>First node try to attempt to stonith 2nd node and take care of its<br>

>resources, but this fails.<br>

>Stonith action is configured to run a custom script to run ssh<br>

>commands,<br>

>I think you should explain what that script does exactly.<br>

>[...]<br>

>_______________________________________________<br>

>Manage your subscription:<br>

><a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

>ClusterLabs home:<br>

><a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

>--<br>

>Regards,<br>

>Reid Wahl, RHCA<br>

>Software Maintenance Engineer, Red Hat<br>

>CEE - Platform Support Delivery - ClusterHA<br>

>_______________________________________________Manage your<br>

>subscription:<a href="https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs</a><br>

>home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

_______________________________________________<br>

Manage your subscription:<br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div>Regards,<br><br></div>Reid Wahl, RHCA<br></div><div>Software Maintenance Engineer, Red Hat<br></div>CEE - Platform Support Delivery - ClusterHA</div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>