[ClusterLabs] 2-Node Cluster - fencing with just one node running ?

Lentes, Bernd bernd.lentes at helmholtz-muenchen.de
Thu Aug 4 09:06:44 EDT 2022


----- On 4 Aug, 2022, at 00:27, Reid Wahl nwahl at redhat.com wrote:

> 
> Such constraints are unnecessary.
> 
> Let's say we have two stonith devices called "fence_dev1" and
> "fence_dev2" that fence nodes 1 and 2, respectively. If node 2 needs
> to be fenced, and fence_dev2 is running on node 2, node 1 will still
> use fence_dev2 to fence node 2. The current location of the stonith
> device only tells us which node is running the recurring monitor
> operation for that stonith device. The device is available to ALL
> nodes, unless it's disabled or it's banned from a given node. So these
> constraints serve no purpose in most cases.

Would do you mean by "banned" ? "crm resource ban ..." ?
Is that something different than a location constraint ?

> If you ban fence_dev2 from node 1, then node 1 won't be able to use
> fence_dev2 to fence node 2. Likewise, if you ban fence_dev1 from node
> 1, then node 1 won't be able to use fence_dev1 to fence itself.
> Usually that's unnecessary anyway, but it may be preferable to power
> ourselves off if we're the last remaining node and a stop operation
> fails.
So banning a fencing device from a node means that this node can't use the fencing device ?
 
> If ha-idg-2 is in standby, it can still fence ha-idg-1. Since it
> sounds like you've banned fence_ilo_ha-idg-1 from ha-idg-1, so that it
> can't run anywhere when ha-idg-2 is in standby, I'm not sure off the
> top of my head whether fence_ilo_ha-idg-1 is available in this
> situation. It may not be.

ha-idg-2 was not only in standby, i also stopped pacemaker on that node.
Then ha-idg-2 can't fence ha-idg-1 i assume.

> 
> A solution would be to stop banning the stonith devices from their
> respective nodes. Surely if fence_ilo_ha-idg-1 had been running on
> ha-idg-1, ha-idg-2 would have been able to use it to fence ha-idg-1.
> (Again, I'm not sure if that's still true if ha-idg-2 is in standby
> **and** fence_ilo_ha-idg-1 is banned from ha-idg-1.)
> 
>> Aug 03 01:19:58 [19364] ha-idg-1 stonith-ng:   notice: log_operation:
>> Operation 'Off' [20705] (call 2 from crmd.19368) for host 'ha-idg-1' with
>> device 'fence_ilo_ha-idg-2' returned: 0 (OK)
>> So the cluster starts the resource running on ha-idg-1 and cut off ha-idg-2,
>> which isn't necessary.
> 
> Here, it sounds like the pcmk_host_list setting is either missing or
> misconfigured for fence_ilo_ha-idg-2. fence_ilo_ha-idg-2 should NOT be
> usable for fencing ha-idg-1.
> 
> fence_ilo_ha-idg-1 should be configured with pcmk_host_list=ha-idg-1,
> and fence_ilo_ha-idg-2 should be configured with
> pcmk_host_list=ha-idg-2.

I will check that.

> What happened is that ha-idg-1 used fence_ilo_ha-idg-2 to fence
> itself. Of course, this only rebooted ha-idg-2. But based on the
> stonith device configuration, pacemaker on ha-idg-1 believed that
> ha-idg-1 had been fenced. Hence the "allegedly just fenced" message.
> 
>>
>> Finally the cluster seems to realize that something went wrong:
>> Aug 03 01:19:58 [19368] ha-idg-1       crmd:     crit: tengine_stonith_notify:
>> We were allegedly just fenced by ha-idg-1 for ha-idg-1!

Bernd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2217 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220804/415cb2e4/attachment-0001.p7s>


More information about the Users mailing list