[ClusterLabs] HA Cluster and Fencing

Thu Sep 3 13:03:13 EDT 2015

On 03/09/15 12:44 PM, Streeter, Michelle N wrote:
> I was trying to get a HA Cluster working but it was not failing over.  
> In past posts, someone kept asking me to get the fencing working and
> make it a priority.  So I finally got the fencing to work with VBox. 
> And the fail over finally started working for my HA cluster.   When I
> tried to explain this to my lead, he didn’t believe me that the fencing
> was the issue with the fail over.   So, would someone help me understand
> why this happened so I can explain it to my lead.   Also, when I was
> trying to get Pacemaker 1.1.11 working, it was failing over fine without
> the fencing but when I added more than one drive to be serviced by the
> cluster via NFS.   The drives were being serviced by  both nodes almost
> as if it was load balancing.  It was suggested back then to get the
> fencing working.   So I take it if I went back to that version, this
> would have fixed the issue.  Would you also help me explain why this is
> true?

That person was me.

It boils down to this;

If a service can safely run in two places at once, you don't need an HA
cluster. So if that is not the case, you need a resource manager to make
sure actions are coordinated between nodes. This fundamentally requires
understanding the state of each node.

If a node stops responding, no assumptions about its state are allowed.
The node must be put into a known state and that is where fencing comes
in. After a successful fence, all nodes enter a known state (with the
lost one being in an "off" state with power fencing and isolated in
fabric fencing).

With known states, the resource manager, pacemaker, can determine what
was lost (if anything), come up with a new configuration based on your
fail-over configuration, plot a course to reach that state and begin
recovery.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?