[ClusterLabs] Stonith external/ssh "device"?

Antony Stone Antony.Stone at ha.open.source.it
Wed Dec 21 10:59:16 EST 2022


Hi.

I'm implementing fencing on a 7-node cluster as described recently:
https://lists.clusterlabs.org/pipermail/users/2022-December/030714.html

I'm using external/ssh for the time being, and it works if I test it using:

stonith -t external/ssh -p "nodeA nodeB nodeC" -T reset nodeB


However, when it's supposed to be invoked because a node has got stuck, I 
simply find syslog full of the following (one from each of the other six nodes 
in the cluster):

pacemaker-fenced[3262]:   notice: Operation reboot of nodeB by <no-one> for 
pacemaker-controld.26852 at nodeA.93b391b2: No such device

I have defined seven stonith resources, one for rebooting each machine, and I 
can see from "crm status" that they have been assigned randomly amongst the 
other servers, usually one per server, so that looks good.


The main things that puzzle me about the log message are:

a) why does it say "<no-one>"?  Is this more like "anyone", meaning that no-
one in particular is required to do this task, provided that at least someone 
does it?  Does this indicate a configuration problem?

b) what is this "device" referred to?  I'm using "external/ssh" so there is no 
actual Stonith device for power-cycling hardware machines - am I supposed to 
define some sort of dummy device somewhere?

For clarity, this is what I have added to my cluster configuration to set this 
up:

primitive reboot_nodeA	stonith:external/ssh	params hostlist="nodeA"
location only_nodeA		reboot_nodeA		-inf: nodeA

...repeated for all seven nodes.

I also have "stonith-enabled=yes" in the cib-bootstrap-options.


Ideas, anyone?

Thanks,


Antony.

-- 
This sentence contains exacly three erors.

                                                   Please reply to the list;
                                                         please *don't* CC me.


More information about the Users mailing list