[Pacemaker] Y should pacemaker be started simultaneously.

Digimer lists at alteeve.ca
Mon Oct 6 01:03:16 EDT 2014


If stonith was configured, after the time out, the first node would 
fence the second node ("unable to reach" != "off").

Alternatively, you can set corosync to 'wait_for_all' and have the first 
node do nothing until it sees the peer.

To do otherwise would be to risk a split-brain. Each node needs to know 
the state of the peer in order to run services safely. By having both 
start at the same time, then they know what the other is doing. By 
disabling quorum, you allow one node to continue to operate when the 
other leaves, but it needs that initial connection to know for sure what 
it's doing.

Alternatively, by fencing the peer on start after timing out, it can say 
for sure that the peer is off and then start services knowing it won't 
cause a split-brain. Of course, if you auto-start the cluster and don't 
use wait_for_all, you risk a fence loop.

digimer

On 06/10/14 12:45 AM, N, Ravikiran wrote:
> Hi all,
>
> I had this question from a while, did not understand the logic for it.
>
> Why should I have to start pacemaker simultaneously on both of my nodes
> (of a 2 node cluster) simultaneously, although I have disabled quorum in
> the cluster.
>
> It fails in the startup step of
>
> /[root at rk16 ~]# service pacemaker start/
>
> /Starting cluster:/
>
> /   Checking if cluster has been disabled at boot...        [  OK  ]/
>
> /   Checking Network Manager...                             [  OK  ]/
>
> /   Global setup...                                         [  OK  ]/
>
> /   Loading kernel modules...                               [  OK  ]/
>
> /   Mounting configfs...                                    [  OK  ]/
>
> /   Starting cman...                                        [  OK  ]/
>
> /   Waiting for quorum... Timed-out waiting for cluster/
>
> /                                                           [FAILED]/
>
> /Stopping cluster:/
>
> /   Leaving fence domain...                                 [  OK  ]/
>
> /   Stopping gfs_controld...                                [  OK  ]/
>
> /   Stopping dlm_controld...                                [  OK  ]/
>
> /   Stopping fenced...                                      [  OK  ]/
>
> /   Stopping cman...                                        [  OK  ]/
>
> /   Waiting for corosync to shutdown:.                      [  OK  ]/
>
> /   Unloading kernel modules...                             [  OK  ]/
>
> /   Unmounting configfs...                                  [  OK  ]/
>
> /Starting Pacemaker Cluster Manager:                        [  OK  ]/
>
> /[root at rk16 ~]# service pacemaker status/
>
> /pacemakerd dead but pid file exists/
>
> /[root at rk16 ~]#/
>
> Regards,
>
> Ravikiran N
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Pacemaker mailing list