[ClusterLabs] Antw: [EXT] Re: Stonith external/ssh "device"?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Dec 22 02:41:39 EST 2022


>>> Antony Stone <Antony.Stone at ha.open.source.it> schrieb am 21.12.2022 um
17:19 in
Nachricht <202212211719.34369.Antony.Stone at ha.open.source.it>:
> On Wednesday 21 December 2022 at 16:59:16, Antony Stone wrote:
> 
>> Hi.
>> 
>> I'm implementing fencing on a 7‑node cluster as described recently:
>> https://lists.clusterlabs.org/pipermail/users/2022‑December/030714.html 
>> 
>> I'm using external/ssh for the time being, and it works if I test it
using:
>> 
>> stonith ‑t external/ssh ‑p "nodeA nodeB nodeC" ‑T reset nodeB
>> 
>> 
>> However, when it's supposed to be invoked because a node has got stuck, I
>> simply find syslog full of the following (one from each of the other six
>> nodes in the cluster):
>> 
>> pacemaker‑fenced[3262]:   notice: Operation reboot of nodeB by <no‑one>
for
>> pacemaker‑controld.26852 at nodeA.93b391b2: No such device
>> 
>> I have defined seven stonith resources, one for rebooting each machine,
and
>> I can see from "crm status" that they have been assigned randomly amongst
>> the other servers, usually one per server, so that looks good.
>> 
>> 
>> The main things that puzzle me about the log message are:
>> 
>> a) why does it say "<no‑one>"?  Is this more like "anyone", meaning that
>> no‑ one in particular is required to do this task, provided that at least
>> someone does it?  Does this indicate a configuration problem?
> 
> PS: I've just noticed that I'm also getting log entries immediately 
> afterwards:
> 
> pacemaker‑controld[3264]:   notice: Peer nodeB was not terminated (reboot)
by 
> 
> <anyone> on behalf of pacemaker‑controld.26852: No such device
> 
>> b) what is this "device" referred to?  I'm using "external/ssh" so there
is
>> no actual Stonith device for power‑cycling hardware machines ‑ am I
>> supposed to define some sort of dummy device somewhere?
>> 
>> For clarity, this is what I have added to my cluster configuration to set
>> this up:
>> 
>> primitive reboot_nodeA	stonith:external/ssh	params
hostlist="nodeA"
>> location only_nodeA		reboot_nodeA		‑inf: nodeA

"location only_nodeA" meaning "location not_nodeA"? ;-)


>> 
>> ...repeated for all seven nodes.
>> 
>> I also have "stonith‑enabled=yes" in the cib‑bootstrap‑options.
>> 
>> 
>> Ideas, anyone?
>> 
>> Thanks,
>> 
>> 
>> Antony.
> 
> ‑‑ 
> Normal people think "If it ain't broke, don't fix it".
> Engineers think "If it ain't broke, it doesn't have enough features yet".
> 
>                                                    Please reply to the
list;
>                                                          please *don't* CC 
> me.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list