[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Stonith failing

Gabriele Bulfon gbulfon at sonicle.com
Thu Jul 30 05:00:19 EDT 2020


It is this system:
https://www.supermicro.com/products/system/1u/1029/SYS-1029TP-DC0R.cfm
 
it has a sas3 backplane with hotswap sas disks that are visible to both nodes at the same time.
 
Gabriele 
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
----------------------------------------------------------------------------------
Da: Ulrich Windl
A: users at clusterlabs.org
Data: 29 luglio 2020 15.15.17 CEST
Oggetto: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Stonith failing
Gabriele Bulfon
schrieb am 29.07.2020 um 14:18 in
Nachricht
:
Hi, it's a single controller, shared to both nodes, SM server.
You mean external controller, like NAS or SAN? I thought you are talking about
an internal controller like SCSI...
I don't know what an "SM server" is.
Regards,
Ulrich
Thanks!
Gabriele
Sonicle S.r.l.
:
http://www.sonicle.com
Music:
http://www.gabrielebulfon.com
Quantum Mechanics :
http://www.cdbaby.com/cd/gabrielebulfon
----------------------------------------------------------------------------
------
Da: Ulrich Windl
A: users at clusterlabs.org
Data: 29 luglio 2020 9.26.39 CEST
Oggetto: [ClusterLabs] Antw: Re: Antw: [EXT] Stonith failing
Gabriele Bulfon
schrieb am 29.07.2020 um 08:01 in
Nachricht
:
That one was taken from a specific implementation on Solaris 11.
The situation is a dual node server with shared storage controller: both
nodes see the same disks concurrently.
You mean you have a dual-controler setup (one controller on each node, both
connected to the same bus)? If so Use sbd!
Here we must be sure that the two nodes are not going to import/mount the
same zpool at the same time, or we will encounter data corruption: node 1
will be perferred for pool 1, node 2 for pool 2, only in case one of the
node
goes down or is taken offline the resources should be first free by the
leaving node and taken by the other node.
Would you suggest one of the available stonith in this case?
Thanks!
Gabriele
Sonicle S.r.l.
:
http://www.sonicle.com
Music:
http://www.gabrielebulfon.com
Quantum Mechanics :
http://www.cdbaby.com/cd/gabrielebulfon
----------------------------------------------------------------------------
------
Da: Strahil Nikolov
A: Cluster Labs - All topics related to open-source clustering welcomed
Gabriele Bulfon
Data: 29 luglio 2020 6.39.08 CEST
Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing
Do you have a reason not to use any stonith already available ?
Best Regards,
Strahil Nikolov
На 28 юли 2020 г. 13:26:52 GMT+03:00, Gabriele Bulfon
написа:
Thanks, I attach here the script.
It basically runs ssh on the other node with no password (must be
preconfigured via authorization keys) with commands.
This was taken from a script by OpenIndiana (I think).
As it stated in the comments, we don't want to halt or boot via ssh,
only reboot.
Maybe this is the problem, we should at least have it shutdown when
asked for.
Actually if I stop corosync in node 2, I don't want it to shutdown the
system but just let node 1 keep control of all resources.
Same if I just shutdown manually node 2,
node 1 should keep control of all resources and release them back on
reboot.
Instead, when I stopped corosync on node 2, log was showing the
temptative to stonith node 2: why?
Thanks!
Gabriele
Sonicle S.r.l.
:
http://www.sonicle.com
Music:
http://www.gabrielebulfon.com
Quantum Mechanics :
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Reid Wahl
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
28 luglio 2020 12.03.46 CEST
Oggetto:
Re: [ClusterLabs] Antw: [EXT] Stonith failing
Gabriele,
"No route to host" is a somewhat generic error message when we can't
find anyone to fence the node. It doesn't mean there's necessarily a
network routing issue at fault; no need to focus on that error message.
I agree with Ulrich about needing to know what the script does. But
based on your initial message, it sounds like your custom fence agent
returns 1 in response to "on" and "off" actions. Am I understanding
correctly? If so, why does it behave that way? Pacemaker is trying to
run a poweroff action based on the logs, so it needs your script to
support an off action.
On Tue, Jul 28, 2020 at 2:47 AM Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
wrote:
Gabriele Bulfon
gbulfon at sonicle.com
schrieb am 28.07.2020 um 10:56 in
Nachricht
:
Hi, now I have my two nodes (xstha1 and xstha2) with IPs configured by
Corosync.
To check how stonith would work, I turned off Corosync service on
second
node.
First node try to attempt to stonith 2nd node and take care of its
resources, but this fails.
Stonith action is configured to run a custom script to run ssh
commands,
I think you should explain what that script does exactly.
[...]
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home:
https://www.clusterlabs.org/
--
Regards,
Reid Wahl, RHCA
Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
_______________________________________________Manage your
subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs
home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200730/a62da83d/attachment-0001.htm>


More information about the Users mailing list