[ClusterLabs] Antw: [EXT] Stonith failing

Thu Jul 30 01:53:41 EDT 2020

On Wed, Jul 29, 2020 at 10:45 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:

> You got plenty of options:
> -  IPMI based  fencing like  HP iLO,  DELL iDRAC
> -  SCSI-3  persistent reservations (which can be extended to fence  the
> node when the reservation(s)  were  removed)
>
> - Shared  disk (even iSCSI)  and using SBD (a.k.a. Poison pill) -> in case
> your hardware has  no watchdog,  you can use  softdog  kernel module  for
> linux.
>

Although softdog may not be reliable in all circumstances.

Best  Regards,
> Strahil Nikolov
>
> На 29 юли 2020 г. 9:01:22 GMT+03:00, Gabriele Bulfon <gbulfon at sonicle.com>
> написа:
> >That one was taken from a specific implementation on Solaris 11.
> >The situation is a dual node server with shared storage controller:
> >both nodes see the same disks concurrently.
> >Here we must be sure that the two nodes are not going to import/mount
> >the same zpool at the same time, or we will encounter data corruption:
> >node 1 will be perferred for pool 1, node 2 for pool 2, only in case
> >one of the node goes down or is taken offline the resources should be
> >first free by the leaving node and taken by the other node.
> >
> >Would you suggest one of the available stonith in this case?
> >
> >Thanks!
> >Gabriele
> >
> >
> >
> >Sonicle S.r.l.
> >:
> >http://www.sonicle.com
> >Music:
> >http://www.gabrielebulfon.com
> >Quantum Mechanics :
> >http://www.cdbaby.com/cd/gabrielebulfon
>
> >----------------------------------------------------------------------------------
> >Da: Strahil Nikolov
> >A: Cluster Labs - All topics related to open-source clustering welcomed
> >Gabriele Bulfon
> >Data: 29 luglio 2020 6.39.08 CEST
> >Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing
> >Do you have a reason not to use any stonith already available ?
> >Best Regards,
> >Strahil Nikolov
> >На 28 юли 2020 г. 13:26:52 GMT+03:00, Gabriele Bulfon
> >написа:
> >Thanks, I attach here the script.
> >It basically runs ssh on the other node with no password (must be
> >preconfigured via authorization keys) with commands.
> >This was taken from a script by OpenIndiana (I think).
> >As it stated in the comments, we don't want to halt or boot via ssh,
> >only reboot.
> >Maybe this is the problem, we should at least have it shutdown when
> >asked for.
> >
> >Actually if I stop corosync in node 2, I don't want it to shutdown the
> >system but just let node 1 keep control of all resources.
> >Same if I just shutdown manually node 2,
> >node 1 should keep control of all resources and release them back on
> >reboot.
> >Instead, when I stopped corosync on node 2, log was showing the
> >temptative to stonith node 2: why?
> >
> >Thanks!
> >Gabriele
> >
> >
> >
> >Sonicle S.r.l.
> >:
> >http://www.sonicle.com
> >Music:
> >http://www.gabrielebulfon.com
> >Quantum Mechanics :
> >http://www.cdbaby.com/cd/gabrielebulfon
> >Da:
> >Reid Wahl
> >A:
> >Cluster Labs - All topics related to open-source clustering welcomed
> >Data:
> >28 luglio 2020 12.03.46 CEST
> >Oggetto:
> >Re: [ClusterLabs] Antw: [EXT] Stonith failing
> >Gabriele,
> >
> >"No route to host" is a somewhat generic error message when we can't
> >find anyone to fence the node. It doesn't mean there's necessarily a
> >network routing issue at fault; no need to focus on that error message.
> >
> >I agree with Ulrich about needing to know what the script does. But
> >based on your initial message, it sounds like your custom fence agent
> >returns 1 in response to "on" and "off" actions. Am I understanding
> >correctly? If so, why does it behave that way? Pacemaker is trying to
> >run a poweroff action based on the logs, so it needs your script to
> >support an off action.
> >On Tue, Jul 28, 2020 at 2:47 AM Ulrich Windl
> >Ulrich.Windl at rz.uni-regensburg.de
> >wrote:
> >Gabriele Bulfon
> >gbulfon at sonicle.com
> >schrieb am 28.07.2020 um 10:56 in
> >Nachricht
> >:
> >Hi, now I have my two nodes (xstha1 and xstha2) with IPs configured by
> >Corosync.
> >To check how stonith would work, I turned off Corosync service on
> >second
> >node.
> >First node try to attempt to stonith 2nd node and take care of its
> >resources, but this fails.
> >Stonith action is configured to run a custom script to run ssh
> >commands,
> >I think you should explain what that script does exactly.
> >[...]
> >_______________________________________________
> >Manage your subscription:
> >https://lists.clusterlabs.org/mailman/listinfo/users
> >ClusterLabs home:
> >https://www.clusterlabs.org/
> >--
> >Regards,
> >Reid Wahl, RHCA
> >Software Maintenance Engineer, Red Hat
> >CEE - Platform Support Delivery - ClusterHA
> >_______________________________________________Manage your
> >subscription:
> https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs
> >home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

-- 
Regards,

Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200729/3d6307ad/attachment.htm>