[ClusterLabs] questions about startup fencing

Wed Nov 29 09:22:07 EST 2017

Hi all,

A colleague has been valiantly trying to help me belatedly learn about
the intricacies of startup fencing, but I'm still not fully
understanding some of the finer points of the behaviour.

The documentation on the "startup-fencing" option[0] says

    Advanced Use Only: Should the cluster shoot unseen nodes? Not
    using the default is very unsafe!

and that it defaults to TRUE, but doesn't elaborate any further:

    https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-cluster-options.html

Let's imagine the following scenario:

- We have a 5-node cluster, with all nodes running cleanly.

- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)

- 4 of the nodes boot up fine and rejoin the cluster within the
  dc-deadtime interval, foruming a quorum, but the 5th doesn't.

IIUC, with startup-fencing enabled, this will result in that 5th node
automatically being fenced.  If I'm right, is that really *always*
necessary?

Let's suppose further that the cluster configuration is such that no
stateful resources which could potentially conflict with other nodes
will ever get launched on that 5th node.  For example it might only
host stateless clones, or resources with require=nothing set, or it
might not even host any resources at all due to some temporary
constraints which have been applied.

In those cases, what is to be gained from fencing?  The only thing I
can think of is that using (say) IPMI to power-cycle the node *might*
fix whatever issue was preventing it from joining the cluster.  Are
there any other reasons for fencing in this case?  It wouldn't help
avoid any data corruption, at least.

Now let's imagine the same scenario, except rather than a clean full
cluster shutdown, all nodes were affected by a power cut, but also
this time the whole cluster is configured to *only* run stateless
clones, so there is no risk of conflict between two nodes accidentally
running the same resource.  On startup, the 4 nodes in the quorum have
no way of knowing that the 5th node was also affected by the power
cut, so in theory from their perspective it could still be running a
stateless clone.  Again, is there anything to be gained from fencing
the 5th node once it exceeds the dc-deadtime threshold for joining,
other than the chance that a reboot might fix whatever was preventing
it from joining, and get the cluster back to full strength?

Also, when exactly does the dc-deadtime timer start ticking?
Is it reset to zero after a node is fenced, so that potentially that
node could go into a reboot loop if dc-deadtime is set too low?

The same questions apply if this troublesome node was actually a
remote node running pacemaker_remoted, rather than the 5th node in the
cluster.

I have an uncomfortable feeling that I'm missing something obvious,
probably due to the documentation's warning that "Not using the
default [for startup-fencing] is very unsafe!"  Or is it only unsafe
when the resource which exceeded dc-deadtime on startup could
potentially be running a stateful resource which the cluster now wants
to restart elsewhere?  If that's the case, would it be possible to
optionally limit startup fencing to when it's really needed?

Thanks for any light you can shed!