[ClusterLabs] How to cancel a fencing request?

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Tue Apr 10 09:46:35 UTC 2018


On Tue, 10 Apr 2018 11:24:04 +0200
Klaus Wenninger <kwenning at redhat.com> wrote:

> On 04/10/2018 08:48 AM, Jehan-Guillaume de Rorthais wrote:
> > On Mon, 09 Apr 2018 17:59:26 -0500
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >  
> >> On Tue, 2018-04-10 at 00:02 +0200, Jehan-Guillaume de Rorthais wrote:  
> >>> On Tue, 03 Apr 2018 17:35:43 -0500
> >>> Ken Gaillot <kgaillot at redhat.com> wrote:
> >>>     
> >>>> On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote:    
> >>>>> On 04/03/2018 05:43 PM, Ken Gaillot wrote:      
> >>>>>> On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote:      
> >>>>>>> On 04/02/2018 04:02 PM, Ken Gaillot wrote:      
> >>>>>>>> On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de
> >>>>>>>> Rorthais
> >>>>>>>> wrote:      
> >>> [...]    
> >>>>>>> -inf constraints like that should effectively prevent
> >>>>>>> stonith-actions from being executed on that nodes.      
> >>>>>> It shouldn't ...
> >>>>>>
> >>>>>> Pacemaker respects target-role=Started/Stopped for controlling
> >>>>>> execution of fence devices, but location (or even whether the
> >>>>>> device is
> >>>>>> "running" at all) only affects monitors, not execution.
> >>>>>>       
> >>>>>>> Though there are a few issues with location constraints
> >>>>>>> and stonith-devices.
> >>>>>>>
> >>>>>>> When stonithd brings up the devices from the cib it
> >>>>>>> runs the parts of pengine that fully evaluate these
> >>>>>>> constraints and it would disable the stonith-device
> >>>>>>> if the resource is unrunable on that node.      
> >>>>>> That should be true only for target-role, not everything that
> >>>>>> affects
> >>>>>> runnability      
> >>>>> cib_device_update bails out via a removal of the device if
> >>>>> - role == stopped
> >>>>> - node not in allowed_nodes-list of stonith-resource
> >>>>> - weight is negative
> >>>>>
> >>>>> Wouldn't that include a -inf rule for a node?      
> >>>> Well, I'll be ... I thought I understood what was going on there.
> >>>> :-)
> >>>> You're right.
> >>>>
> >>>> I've frequently seen it recommended to ban fence devices from their
> >>>> target when using one device per target. Perhaps it would be better
> >>>> to
> >>>> give a lower (but positive) score on the target compared to the
> >>>> other
> >>>> node(s), so it can be used when no other nodes are available. you
> >>>> could
> >>>> re-manage.      
> >>> Wait, you mean a fencing resource can be triggered from its own
> >>> target? Wat
> >>> happen then? Node suicide and all the cluster nodes are shutdown?
> >>>
> >>> Thanks,    
> >> A node can fence itself, though it will be the cluster's last resort
> >> when no other node can. It doesn't necessarily imply all other nodes
> >> are shut down ...  
> > Indeed, sorry I was clear enough: I was talking about a fencing race
> > situation.  
> Fencing races - as well if suicide is involved - shouldn't be
> prevented by one partition not having quorum.
> That should be an issue just with 2-node-feature enabled.
> Which scenario did you have in mind?

The two-node scenario. The exact one I described upthread, minus the -inf
constraint location as Ken suggested.

> >> there may be other nodes up, but they are not allowed
> >> execute the relevant fence device for whatever reason.  
> > In such situation, how other node can confirm the node fence itself without
> > confirmation?  
> 
> Basically I see 2 cases:
> - sbd with watchdog-fencing where the other nodes assume
>   suicide to be successful after a certain time

Sure. With watchdog enabled cluster wide.

> - basically if a node is able to commit suicide (while part of
>   a quorate partition) I would expect it to come back online
>   after reboot telling the cluster that the resources are down

I would expect as well, but the fencing request hadn't been confirmed to anyone
yet

* is it enough that the node reboot and probes for resources to declare they
  are all stopped?
* is it enough so the node can acknowledge the DC/stonithd the fencing request
  was succeed?
* what if the fencing action is not "reboot" but "off"?

Thanks for your help!


More information about the Users mailing list