[ClusterLabs] Feedback wanted: Node reaction to fabric fencing

Roger Zhou ZZhou at suse.com
Thu Jul 25 04:24:33 EDT 2019


On 7/25/19 1:33 AM, Ken Gaillot wrote:
> Hi all,
> 
> A recent bugfix (clbz#5386) brings up a question.
> 
> A node may receive notification of its own fencing when fencing is
> misconfigured (for example, an APC switch with the wrong plug number)
> or when fabric fencing is used that doesn't cut the cluster network
> (for example, fence_scsi).
> 
> Previously, the *intended* behavior was for the node to attempt to
> reboot itself in that situation, falling back to stopping pacemaker if
> that failed. However, due to the bug, the reboot always failed, so the
> behavior effectively was to stop pacemaker.
> 
> Now that the bug is fixed, the node will indeed reboot in that
> situation.
> 
> It occurred to me that some users configure fabric fencing specifically
> so that nodes aren't ever intentionally rebooted. Therefore, I intend
> to make this behavior configurable.
> 
> My question is, what do you think the default should be?
> 
> 1. Default to the correct behavior (reboot)
> 
> 2. Default to the current behavior (stop)
> 
> 3. Default to the current behavior for now, and change it to the
> correct behavior whenever pacemaker 2.1 is released (probably a few
> years from now)
> 

Sounds, 3) is the best choice.

Make it configurable, and keep the current behavior(stop) for backward 
compatibility for the current minor version, eg. next 2.0.z(3+).

Well, the correct behavior (reboot) as the default should be enforced. 
It should be the same crucial as stop failures of a resource. Make sense 
in the next minor version, say, 2.1.

Thanks,
Roger






More information about the Users mailing list