[ClusterLabs] Speed up the resource moves in the case of a node hard shutdown

Digimer lists at alteeve.ca
Tue Feb 13 11:24:41 EST 2018

On 2018-02-13 05:46 AM, Maxim wrote:
> 12.02.2018 19:31, Digimer пишет:
>> Without fencing, all bets are  off. Please enable it and see if the
>> issue remains
> Seems, i know [in theory] about the fencing ability and its importance
> (although I've never configured it so far).
> But i don't undestand how it would help in the situtions of the hard
> reboot/shutdown.

An availability cluster's job is to keep things running. To do this,
there must be coordination between the nodes (otherwise, just run things
everywhere and be done with it). Thus, when a node stops responding, it
is critical that the lost node be put into a known state.

If you allow assumptions to be made, you will eventually assume wrong.
That could have consequences as "minor" as confusing switches/routers to
as devastating as corrupted data.

Fencing is not meant to speed up recovery, it is critical to ensuring
recovery works at all.

This is a common confusion (and people often mistakenly think that
quorum is how you avoid this, which is incorrect). There is no
replacement for fencing; You need it in any availability system. Without
it, it is like driving without a seat-belt.


>> Changing EL6 to corosync 2  pushes further into uncharted waters. EL6
>> should be using the cman pluging with corosync 1. May I ask why you
>> don't use EL7 if you want such a recent stack?
> For historical reasons. Let's say so. I've another software that built
> for RHEL 6 like OS and have to be installed on the cluster node.
> EL 7 stack is already not so recent, but it's one the most stable and
> least vulnearable, i suppose. And i understand the risks.
> I will update pcs to the latest version when i find a bit of free time.
