[ClusterLabs] How to cancel a fencing request?

Tue Apr 10 05:24:04 EDT 2018

On 04/10/2018 08:48 AM, Jehan-Guillaume de Rorthais wrote:
> On Mon, 09 Apr 2018 17:59:26 -0500
> Ken Gaillot <kgaillot at redhat.com> wrote:
>
>> On Tue, 2018-04-10 at 00:02 +0200, Jehan-Guillaume de Rorthais wrote:
>>> On Tue, 03 Apr 2018 17:35:43 -0500
>>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>>   
>>>> On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote:  
>>>>> On 04/03/2018 05:43 PM, Ken Gaillot wrote:    
>>>>>> On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote:    
>>>>>>> On 04/02/2018 04:02 PM, Ken Gaillot wrote:    
>>>>>>>> On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de
>>>>>>>> Rorthais
>>>>>>>> wrote:    
>>> [...]  
>>>>>>> -inf constraints like that should effectively prevent
>>>>>>> stonith-actions from being executed on that nodes.    
>>>>>> It shouldn't ...
>>>>>>
>>>>>> Pacemaker respects target-role=Started/Stopped for controlling
>>>>>> execution of fence devices, but location (or even whether the
>>>>>> device is
>>>>>> "running" at all) only affects monitors, not execution.
>>>>>>     
>>>>>>> Though there are a few issues with location constraints
>>>>>>> and stonith-devices.
>>>>>>>
>>>>>>> When stonithd brings up the devices from the cib it
>>>>>>> runs the parts of pengine that fully evaluate these
>>>>>>> constraints and it would disable the stonith-device
>>>>>>> if the resource is unrunable on that node.    
>>>>>> That should be true only for target-role, not everything that
>>>>>> affects
>>>>>> runnability    
>>>>> cib_device_update bails out via a removal of the device if
>>>>> - role == stopped
>>>>> - node not in allowed_nodes-list of stonith-resource
>>>>> - weight is negative
>>>>>
>>>>> Wouldn't that include a -inf rule for a node?    
>>>> Well, I'll be ... I thought I understood what was going on there.
>>>> :-)
>>>> You're right.
>>>>
>>>> I've frequently seen it recommended to ban fence devices from their
>>>> target when using one device per target. Perhaps it would be better
>>>> to
>>>> give a lower (but positive) score on the target compared to the
>>>> other
>>>> node(s), so it can be used when no other nodes are available. you
>>>> could
>>>> re-manage.    
>>> Wait, you mean a fencing resource can be triggered from its own
>>> target? Wat
>>> happen then? Node suicide and all the cluster nodes are shutdown?
>>>
>>> Thanks,  
>> A node can fence itself, though it will be the cluster's last resort
>> when no other node can. It doesn't necessarily imply all other nodes
>> are shut down ...
> Indeed, sorry I was clear enough: I was talking about a fencing race
> situation.
Fencing races - as well if suicide is involved - shouldn't be
prevented by one partition not having quorum.
That should be an issue just with 2-node-feature enabled.
Which scenario did you have in mind?
>
>> there may be other nodes up, but they are not allowed
>> execute the relevant fence device for whatever reason.
> In such situation, how other node can confirm the node fence itself without
> confirmation?

Basically I see 2 cases:
- sbd with watchdog-fencing where the other nodes assume
  suicide to be successful after a certain time
- basically if a node is able to commit suicide (while part of
  a quorate partition) I would expect it to come back online
  after reboot telling the cluster that the resources are down

Regards,
Klaus

>
>> But of course there might be no other nodes up, in which case, yes, the
>> cluster dies (the idea being that the node is known to be malfunctioning, so
>> stop it from possibly corrupting data).
> This make sense to me.
>
> Thanks,