[ClusterLabs] sub-second error detection and failover possible
lists at alteeve.ca
Tue Sep 1 11:07:29 EDT 2015
On 01/09/15 09:27 AM, Michael Schwartzkopff wrote:
> perhaps this question was answered elsewhere, but I count not find any
> satisfying answer. So is it possible to set uo a corosync/pacemaker cluster
> that detects errors and does the failover in a sub-second time span?
> if yes, how?
> Mit freundlichen Grüßen,
> Michael Schwartzkopff
Corosync declares a loss of a node, so you would need to start by tuning
it (token loss timeout and loss count). Of course, as you tighten this
up, the chances of a transient issue causing false declaration of node
Next, you'd need a fence device that can terminate and verify the node's
termination very, very quickly. I do not know of such a device. Part of
this is also the time taken for the fence agent to be invoked.
Last, you'd need to have pacemaker calculate the new desired state and
make those changes. The services being recovered would need to start
In theory, it's possible I suppose. In practice, very unlikely.
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Users