[ClusterLabs] Antw: [EXT] Stonith failing

Andrei Borzenkov arvidjaar at gmail.com
Sun Aug 16 05:40:05 EDT 2020


16.08.2020 04:25, Reid Wahl пишет:
> 
> 
>> - considering that I have both nodes with stonith against the other node,
>> once the two nodes can communicate, how can I be sure the two nodes will
>> not try to stonith each other?
>>
> 
> The simplest option is to add a delay attribute (e.g., delay=10) to one of
> the stonith devices. That way, if both nodes want to fence each other, the
> node whose stonith device has a delay configured will wait for the delay to
> expire before executing the reboot action.
> 

Current pacemaker (2.0.4) also supports priority-fencing-delay option
that computes delay based on which resources are active on specific
node, so favoring node with "more important" resources.

> Alternatively, you can set up corosync-qdevice, using a separate system
> running qnetd server as a quorum arbitrator.
> 

Any solution that is based on node suicide is prone to complete cluster
loss. In particular, in two node cluster with qdevice surviving node
will commit suicide is qnetd is not accessible.

As long as external stonith is reasonably reliable it is much preferred
to any solution based on quorum (unless you have very specific
requirements and can tolerate running remaining nodes in "frozen" mode
to limit unavailability).

And before someone jumps in - SBD falls into "solution based on suicide"
as well.


More information about the Users mailing list