[ClusterLabs] Feedback wanted: Node reaction to fabric fencing
Ken Gaillot
kgaillot at redhat.com
Thu Jul 25 11:10:11 EDT 2019
On Thu, 2019-07-25 at 08:24 +0000, Roger Zhou wrote:
> On 7/25/19 1:33 AM, Ken Gaillot wrote:
> > Hi all,
> >
> > A recent bugfix (clbz#5386) brings up a question.
> >
> > A node may receive notification of its own fencing when fencing is
> > misconfigured (for example, an APC switch with the wrong plug
> > number)
> > or when fabric fencing is used that doesn't cut the cluster network
> > (for example, fence_scsi).
> >
> > Previously, the *intended* behavior was for the node to attempt to
> > reboot itself in that situation, falling back to stopping pacemaker
> > if
> > that failed. However, due to the bug, the reboot always failed, so
> > the
> > behavior effectively was to stop pacemaker.
> >
> > Now that the bug is fixed, the node will indeed reboot in that
> > situation.
> >
> > It occurred to me that some users configure fabric fencing
> > specifically
> > so that nodes aren't ever intentionally rebooted. Therefore, I
> > intend
> > to make this behavior configurable.
> >
> > My question is, what do you think the default should be?
> >
> > 1. Default to the correct behavior (reboot)
> >
> > 2. Default to the current behavior (stop)
> >
> > 3. Default to the current behavior for now, and change it to the
> > correct behavior whenever pacemaker 2.1 is released (probably a few
> > years from now)
> >
>
> Sounds, 3) is the best choice.
>
> Make it configurable, and keep the current behavior(stop) for
> backward
> compatibility for the current minor version, eg. next 2.0.z(3+).
>
> Well, the correct behavior (reboot) as the default should be
> enforced.
> It should be the same crucial as stop failures of a resource. Make
> sense
> in the next minor version, say, 2.1.
>
> Thanks,
> Roger
Thanks everyone. I agree with the points raised, and plan to go with
this (though of course we can change before release if something else
becomes preferable):
- In 1.1 and 2.0, default to "stop"
- In (future) 2.1, default to "panic"
FYI, the idea behind 2.1 will be to collect changes in behavior that
are significant enough to draw attention to, but don't break rolling
upgrades from any currently supported version. These would be mainly be
changes in defaults, tool usage, and the C library API.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list