[ClusterLabs] Stonith external/ssh "device"?
Antony Stone
Antony.Stone at ha.open.source.it
Wed Dec 21 10:59:16 EST 2022
Hi.
I'm implementing fencing on a 7-node cluster as described recently:
https://lists.clusterlabs.org/pipermail/users/2022-December/030714.html
I'm using external/ssh for the time being, and it works if I test it using:
stonith -t external/ssh -p "nodeA nodeB nodeC" -T reset nodeB
However, when it's supposed to be invoked because a node has got stuck, I
simply find syslog full of the following (one from each of the other six nodes
in the cluster):
pacemaker-fenced[3262]: notice: Operation reboot of nodeB by <no-one> for
pacemaker-controld.26852 at nodeA.93b391b2: No such device
I have defined seven stonith resources, one for rebooting each machine, and I
can see from "crm status" that they have been assigned randomly amongst the
other servers, usually one per server, so that looks good.
The main things that puzzle me about the log message are:
a) why does it say "<no-one>"? Is this more like "anyone", meaning that no-
one in particular is required to do this task, provided that at least someone
does it? Does this indicate a configuration problem?
b) what is this "device" referred to? I'm using "external/ssh" so there is no
actual Stonith device for power-cycling hardware machines - am I supposed to
define some sort of dummy device somewhere?
For clarity, this is what I have added to my cluster configuration to set this
up:
primitive reboot_nodeA stonith:external/ssh params hostlist="nodeA"
location only_nodeA reboot_nodeA -inf: nodeA
...repeated for all seven nodes.
I also have "stonith-enabled=yes" in the cib-bootstrap-options.
Ideas, anyone?
Thanks,
Antony.
--
This sentence contains exacly three erors.
Please reply to the list;
please *don't* CC me.
More information about the Users
mailing list