[ClusterLabs] Antw: [EXT] Stonith failing

Thu Jul 30 01:42:28 EDT 2020

You got plenty of options:
-  IPMI based  fencing like  HP iLO,  DELL iDRAC
-  SCSI-3  persistent reservations (which can be extended to fence  the node when the reservation(s)  were  removed)

- Shared  disk (even iSCSI)  and using SBD (a.k.a. Poison pill) -> in case your hardware has  no watchdog,  you can use  softdog  kernel module  for linux.

Best  Regards,
Strahil Nikolov

На 29 юли 2020 г. 9:01:22 GMT+03:00, Gabriele Bulfon <gbulfon at sonicle.com> написа:
>That one was taken from a specific implementation on Solaris 11.
>The situation is a dual node server with shared storage controller:
>both nodes see the same disks concurrently.
>Here we must be sure that the two nodes are not going to import/mount
>the same zpool at the same time, or we will encounter data corruption:
>node 1 will be perferred for pool 1, node 2 for pool 2, only in case
>one of the node goes down or is taken offline the resources should be
>first free by the leaving node and taken by the other node.
> 
>Would you suggest one of the available stonith in this case?
> 
>Thanks!
>Gabriele
> 
> 
> 
>Sonicle S.r.l. 
>: 
>http://www.sonicle.com
>Music: 
>http://www.gabrielebulfon.com
>Quantum Mechanics : 
>http://www.cdbaby.com/cd/gabrielebulfon
>----------------------------------------------------------------------------------
>Da: Strahil Nikolov
>A: Cluster Labs - All topics related to open-source clustering welcomed
>Gabriele Bulfon
>Data: 29 luglio 2020 6.39.08 CEST
>Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing
>Do you have a reason not to use any stonith already available ?
>Best Regards,
>Strahil Nikolov
>На 28 юли 2020 г. 13:26:52 GMT+03:00, Gabriele Bulfon
>написа:
>Thanks, I attach here the script.
>It basically runs ssh on the other node with no password (must be
>preconfigured via authorization keys) with commands.
>This was taken from a script by OpenIndiana (I think).
>As it stated in the comments, we don't want to halt or boot via ssh,
>only reboot.
>Maybe this is the problem, we should at least have it shutdown when
>asked for.
> 
>Actually if I stop corosync in node 2, I don't want it to shutdown the
>system but just let node 1 keep control of all resources.
>Same if I just shutdown manually node 2, 
>node 1 should keep control of all resources and release them back on
>reboot.
>Instead, when I stopped corosync on node 2, log was showing the
>temptative to stonith node 2: why?
> 
>Thanks!
>Gabriele
> 
> 
> 
>Sonicle S.r.l. 
>: 
>http://www.sonicle.com
>Music: 
>http://www.gabrielebulfon.com
>Quantum Mechanics : 
>http://www.cdbaby.com/cd/gabrielebulfon
>Da:
>Reid Wahl
>A:
>Cluster Labs - All topics related to open-source clustering welcomed
>Data:
>28 luglio 2020 12.03.46 CEST
>Oggetto:
>Re: [ClusterLabs] Antw: [EXT] Stonith failing
>Gabriele,
> 
>"No route to host" is a somewhat generic error message when we can't
>find anyone to fence the node. It doesn't mean there's necessarily a
>network routing issue at fault; no need to focus on that error message.
> 
>I agree with Ulrich about needing to know what the script does. But
>based on your initial message, it sounds like your custom fence agent
>returns 1 in response to "on" and "off" actions. Am I understanding
>correctly? If so, why does it behave that way? Pacemaker is trying to
>run a poweroff action based on the logs, so it needs your script to
>support an off action.
>On Tue, Jul 28, 2020 at 2:47 AM Ulrich Windl
>Ulrich.Windl at rz.uni-regensburg.de
>wrote:
>Gabriele Bulfon
>gbulfon at sonicle.com
>schrieb am 28.07.2020 um 10:56 in
>Nachricht
>:
>Hi, now I have my two nodes (xstha1 and xstha2) with IPs configured by
>Corosync.
>To check how stonith would work, I turned off Corosync service on
>second
>node.
>First node try to attempt to stonith 2nd node and take care of its
>resources, but this fails.
>Stonith action is configured to run a custom script to run ssh
>commands,
>I think you should explain what that script does exactly.
>[...]
>_______________________________________________
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>ClusterLabs home:
>https://www.clusterlabs.org/
>--
>Regards,
>Reid Wahl, RHCA
>Software Maintenance Engineer, Red Hat
>CEE - Platform Support Delivery - ClusterHA
>_______________________________________________Manage your
>subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs
>home: https://www.clusterlabs.org/