[ClusterLabs] Antw: [EXT] Stonith failing

Klaus Wenninger kwenning at redhat.com
Tue Aug 18 02:21:50 EDT 2020


On 8/18/20 7:49 AM, Andrei Borzenkov wrote:
> 17.08.2020 23:39, Jehan-Guillaume de Rorthais пишет:
>> On Mon, 17 Aug 2020 10:19:45 -0500
>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>>> On Fri, 2020-08-14 at 15:09 +0200, Gabriele Bulfon wrote:
>>>> Thanks to all your suggestions, I now have the systems with stonith
>>>> configured on ipmi.  
>>> A word of caution: if the IPMI is on-board -- i.e. it shares the same
>>> power supply as the computer -- power becomes a single point of
>>> failure. If the node loses power, the other node can't fence because
>>> the IPMI is also down, and the cluster can't recover.
>>>
>>> Some on-board IPMI controllers can share an Ethernet port with the main
>>> computer, which would be a similar situation.
>>>
>>> It's best to have a backup fencing method when using IPMI as the
>>> primary fencing method. An example would be an intelligent power switch
>>> or sbd.
>> How SBD would be useful in this scenario? Poison pill will not be swallowed by
>> the dead node... Is it just to wait for the watchdog timeout?
>>
> Node is expected to commit suicide if SBD lost access to shared block
> device. So either node swallowed poison pill and died or node died
> because it realized it was impossible to see poison pill or node was
> dead already. After watchdog timeout (twice watchdog timeout for safety)
> we assume node is dead.
Yes, like this a suicide via watchdog will be triggered if there are
issues with thedisk. This is why it is important to have a reliable
watchdog with SBD even whenusing poison pill. As this alone would
make a single shared disk a SPOF, runningwith pacemaker integration
(default) a node with SBD will survive despite ofloosing the disk
when it has quorum and pacemaker looks healthy. As corosync-quorum
in 2-node-mode obviously won't be fit for this purpose SBD will switch
to checking for presence of both nodes if 2-node-flag is set.

Sorry for the lengthy explanation but the full picture is required
to understand whyit is sufficiently reliable and useful if configured
correctly.

Klaus



More information about the Users mailing list