<div dir="ltr">I forgot to mention, OS is Debian Wheezy 64-bit, Corosync and Pacemaker installed from packages via apt-get, and there are no local firewall rules in place:<div><br></div><div><div># iptables -L</div><div>Chain INPUT (policy ACCEPT)</div>


<div>target     prot opt source               destination</div><div><br></div><div>Chain FORWARD (policy ACCEPT)</div><div>target     prot opt source               destination</div><div><br></div><div>Chain OUTPUT (policy ACCEPT)</div>


<div>target     prot opt source               destination</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Sep 24, 2013 at 11:41 AM, David Parker <span dir="ltr"><<a href="mailto:dparker@utica.edu" target="_blank">dparker@utica.edu</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello,<div><br></div><div>I have a 2-node cluster using Corosync and Pacemaker, where the nodes are actually to VirtualBox VMs on the same physical machine.  I have some resources set up in Pacemaker, and everything works fine if I move them in a controlled way with the "crm_resource -r <resource> --move --node <node>" command.</div>


<div><br></div><div>However, when I hard-fail one of the nodes via the "poweroff" command in Virtual Box, which "pulls the plug" on the VM, the resources do not move, and I see the following output in the log on the remaining node:</div>


<div><br></div><div><div>Sep 24 11:20:30 corosync [TOTEM ] The token was lost in the OPERATIONAL state.</div><div>Sep 24 11:20:30 corosync [TOTEM ] A processor failed, forming new configuration.</div><div>Sep 24 11:20:30 corosync [TOTEM ] entering GATHER state from 2.</div>


<div>Sep 24 11:20:31 test-vm-2 lrmd: [2503]: debug: rsc:drbd_r0:0 monitor[31] (pid 8495)</div><div>drbd[8495]:     2013/09/24_11:20:31 WARNING: This resource agent is deprecated and may be removed in a future release. See the man page for details. To suppress this warning, set the "ignore_deprecation" resource parameter to true.</div>


<div>drbd[8495]:     2013/09/24_11:20:31 WARNING: This resource agent is deprecated and may be removed in a future release. See the man page for details. To suppress this warning, set the "ignore_deprecation" resource parameter to true.</div>


<div>drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c /etc/drbd.conf role r0</div><div>drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Exit code 0</div><div>drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Command output: Secondary/Primary</div>


<div>drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c /etc/drbd.conf cstate r0</div><div>drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Exit code 0</div><div>drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Command output: Connected</div>


<div>drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0 status: Secondary/Primary Secondary Primary Connected</div><div>Sep 24 11:20:31 test-vm-2 lrmd: [2503]: info: operation monitor[31] on drbd_r0:0 for client 2506: pid 8495 exited with return code 0</div>


<div>Sep 24 11:20:32 corosync [TOTEM ] entering GATHER state from 0.</div><div>Sep 24 11:20:34 corosync [TOTEM ] The consensus timeout expired.</div><div>Sep 24 11:20:34 corosync [TOTEM ] entering GATHER state from 3.</div>


<div>Sep 24 11:20:36 corosync [TOTEM ] The consensus timeout expired.</div><div>Sep 24 11:20:36 corosync [TOTEM ] entering GATHER state from 3.</div><div>Sep 24 11:20:38 corosync [TOTEM ] The consensus timeout expired.</div>


<div>Sep 24 11:20:38 corosync [TOTEM ] entering GATHER state from 3.</div><div>Sep 24 11:20:40 corosync [TOTEM ] The consensus timeout expired.</div><div>Sep 24 11:20:40 corosync [TOTEM ] entering GATHER state from 3.</div>


<div>Sep 24 11:20:40 corosync [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.</div><div>


Sep 24 11:20:43 corosync [TOTEM ] The consensus timeout expired.</div><div>Sep 24 11:20:43 corosync [TOTEM ] entering GATHER state from 3.</div><div>Sep 24 11:20:43 corosync [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.</div>


<div>Sep 24 11:20:45 corosync [TOTEM ] The consensus timeout expired.</div><div>Sep 24 11:20:45 corosync [TOTEM ] entering GATHER state from 3.</div><div>Sep 24 11:20:45 corosync [TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.</div>


<div>Sep 24 11:20:47 corosync [TOTEM ] The consensus timeout expired.</div><div><br></div><div>Those last 3 messages just repeat over and over, the cluster never recovers, and the resources never move.  "crm_mon" reports that the resources are still running on the dead node, and shows no indication that anything has gone wrong.</div>


<div><br></div><div>Does anyone know what the issue could be?  My expectation was that the remaining node would become the sole member of the cluster, take over the resources, and everything would keep running.</div><div>


<br></div><div>For reference, my corosync.conf file is below:</div><div><br></div><div><div>compatibility: whitetank</div><div><br></div><div>totem {</div><div>        version: 2</div><div>        secauth: off</div><div>


        interface {</div>

<div>                member {</div><div>                        memberaddr: 192.168.25.201</div><div>                }</div><div>                member {</div><div>                        memberaddr: 192.168.25.202</div>


<div>

                }</div><div>                ringnumber: 0</div><div>                bindnetaddr: 192.168.25.0</div><div>                mcastport: 5405</div><div>        }</div><div>        transport: udpu</div><div>}</div>


<div><br></div><div>logging {</div><div>        fileline: off</div><div>        to_logfile: yes</div><div>        to_syslog: yes</div><div>        debug: on</div><div>        logfile: /var/log/cluster/corosync.log</div><div>


        timestamp: on</div><div>        logger_subsys {</div><div>                subsys: AMF</div><div>                debug: on</div><div>        }</div><div>}</div></div><div><br></div><div><br></div><div>Thanks!</div>


<div>Dave</div><span class="HOEnZb"><font color="#888888"><div><br></div>-- <br><div>Dave Parker</div>Systems Administrator<br>Utica College<br>Integrated Information Technology Services<br><a href="tel:%28315%29%20792-3229" value="+13157923229" target="_blank">(315) 792-3229</a><br>


Registered Linux User #408177

</font></span></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Dave Parker</div>Systems Administrator<br>Utica College<br>Integrated Information Technology Services<br>(315) 792-3229<br>Registered Linux User #408177

</div>