[ClusterLabs] How to cancel a fencing request?
Jehan-Guillaume de Rorthais
jgdr at dalibo.com
Tue Apr 10 05:46:35 EDT 2018
On Tue, 10 Apr 2018 11:24:04 +0200
Klaus Wenninger <kwenning at redhat.com> wrote:
> On 04/10/2018 08:48 AM, Jehan-Guillaume de Rorthais wrote:
> > On Mon, 09 Apr 2018 17:59:26 -0500
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >
> >> On Tue, 2018-04-10 at 00:02 +0200, Jehan-Guillaume de Rorthais wrote:
> >>> On Tue, 03 Apr 2018 17:35:43 -0500
> >>> Ken Gaillot <kgaillot at redhat.com> wrote:
> >>>
> >>>> On Tue, 2018-04-03 at 21:46 +0200, Klaus Wenninger wrote:
> >>>>> On 04/03/2018 05:43 PM, Ken Gaillot wrote:
> >>>>>> On Tue, 2018-04-03 at 07:36 +0200, Klaus Wenninger wrote:
> >>>>>>> On 04/02/2018 04:02 PM, Ken Gaillot wrote:
> >>>>>>>> On Mon, 2018-04-02 at 10:54 +0200, Jehan-Guillaume de
> >>>>>>>> Rorthais
> >>>>>>>> wrote:
> >>> [...]
> >>>>>>> -inf constraints like that should effectively prevent
> >>>>>>> stonith-actions from being executed on that nodes.
> >>>>>> It shouldn't ...
> >>>>>>
> >>>>>> Pacemaker respects target-role=Started/Stopped for controlling
> >>>>>> execution of fence devices, but location (or even whether the
> >>>>>> device is
> >>>>>> "running" at all) only affects monitors, not execution.
> >>>>>>
> >>>>>>> Though there are a few issues with location constraints
> >>>>>>> and stonith-devices.
> >>>>>>>
> >>>>>>> When stonithd brings up the devices from the cib it
> >>>>>>> runs the parts of pengine that fully evaluate these
> >>>>>>> constraints and it would disable the stonith-device
> >>>>>>> if the resource is unrunable on that node.
> >>>>>> That should be true only for target-role, not everything that
> >>>>>> affects
> >>>>>> runnability
> >>>>> cib_device_update bails out via a removal of the device if
> >>>>> - role == stopped
> >>>>> - node not in allowed_nodes-list of stonith-resource
> >>>>> - weight is negative
> >>>>>
> >>>>> Wouldn't that include a -inf rule for a node?
> >>>> Well, I'll be ... I thought I understood what was going on there.
> >>>> :-)
> >>>> You're right.
> >>>>
> >>>> I've frequently seen it recommended to ban fence devices from their
> >>>> target when using one device per target. Perhaps it would be better
> >>>> to
> >>>> give a lower (but positive) score on the target compared to the
> >>>> other
> >>>> node(s), so it can be used when no other nodes are available. you
> >>>> could
> >>>> re-manage.
> >>> Wait, you mean a fencing resource can be triggered from its own
> >>> target? Wat
> >>> happen then? Node suicide and all the cluster nodes are shutdown?
> >>>
> >>> Thanks,
> >> A node can fence itself, though it will be the cluster's last resort
> >> when no other node can. It doesn't necessarily imply all other nodes
> >> are shut down ...
> > Indeed, sorry I was clear enough: I was talking about a fencing race
> > situation.
> Fencing races - as well if suicide is involved - shouldn't be
> prevented by one partition not having quorum.
> That should be an issue just with 2-node-feature enabled.
> Which scenario did you have in mind?
The two-node scenario. The exact one I described upthread, minus the -inf
constraint location as Ken suggested.
> >> there may be other nodes up, but they are not allowed
> >> execute the relevant fence device for whatever reason.
> > In such situation, how other node can confirm the node fence itself without
> > confirmation?
>
> Basically I see 2 cases:
> - sbd with watchdog-fencing where the other nodes assume
> suicide to be successful after a certain time
Sure. With watchdog enabled cluster wide.
> - basically if a node is able to commit suicide (while part of
> a quorate partition) I would expect it to come back online
> after reboot telling the cluster that the resources are down
I would expect as well, but the fencing request hadn't been confirmed to anyone
yet
* is it enough that the node reboot and probes for resources to declare they
are all stopped?
* is it enough so the node can acknowledge the DC/stonithd the fencing request
was succeed?
* what if the fencing action is not "reboot" but "off"?
Thanks for your help!
More information about the Users
mailing list