<div dir="ltr">Hello,<div><br></div><div>Sorry for the delay in my reply.  I've been doing a lot of experimentation, but so far I've had no luck.</div><div><br></div><div>Thanks for the suggestion, but it seems I'm not able to use CMAN.  I'm running Debian Wheezy with Corosync and Pacemaker installed via apt-get.  When I installed CMAN and set up a cluster.conf file, Pacemaker refused to start and said that CMAN was not supported.  When CMAN is not installed, Pacemaker starts up fine, but I see these lines in the log:</div>


<div><br></div><div><div>Sep 30 23:36:29 test-vm-1 crmd: [6941]: ERROR: init_quorum_connection: The Corosync quorum API is not supported in this build</div><div>Sep 30 23:36:29 test-vm-1 pacemakerd: [6932]: ERROR: pcmk_child_exit: Child process crmd exited (pid=6941, rc=100)</div>


<div>Sep 30 23:36:29 test-vm-1 pacemakerd: [6932]: WARN: pcmk_child_exit: Pacemaker child process crmd no longer wishes to be respawned. Shutting ourselves down.</div></div><div><div><br></div></div><div>So, then I checked to see which plugins are supported:</div>


<div><br></div><div><div># pacemakerd -F</div><div>Pacemaker 1.1.7 (Build: ee0730e13d124c3d58f00016c3376a1de5323cff)</div><div> Supporting:  generated-manpages agent-manpages ncurses  heartbeat corosync-plugin snmp libesmtp</div>


</div><div><br></div><div>Am I correct in believing that this Pacemaker package has been compiled without support for any quorum API?  If so, does anyone know if there is a Debian package which has the correct support?</div>


<div><br></div><div>I also tried compiling LibQB, Corosync and Pacemaker from source via git, following the instructions documented here:</div><div><br></div><div><a href="http://clusterlabs.org/wiki/SourceInstall">http://clusterlabs.org/wiki/SourceInstall</a><br>


</div><div><br></div><div>I was hopeful that this would work, because as I understand it, Corosync 2.x no longer uses CMAN.  Everything compiled and started fine, but the compiled version of Pacemaker did not include either the 'crm' or 'pcs' commands.  Do I need to install something else in order to get one of these?</div>


<div><br></div><div>Any and all help is greatly appreciated!</div><div><br></div><div>    Thanks,</div><div>    Dave</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Sep 25, 2013 at 6:08 AM, David Lang <span dir="ltr"><<a href="mailto:david@lang.hm" target="_blank">david@lang.hm</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">the cluster is trying to reach a quarum (the majority of the nodes talking to each other) and that is never going to happen with only one node. so you have to disable this.<br>


<br>

try putting<br>

<cman two_node="1" expected_votes="1" transport="udpu"/><br>

in your cluster.conf<br>

<br>

David Lang<br>

<br>

 On Tue, 24 Sep 2013, David Parker wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Date: Tue, 24 Sep 2013 11:48:59 -0400<br>

From: David Parker <<a href="mailto:dparker@utica.edu" target="_blank">dparker@utica.edu</a>><br>

Reply-To: The Pacemaker cluster resource manager<br>

    <<a href="mailto:pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a><u></u>><br>

To: The Pacemaker cluster resource manager <<a href="mailto:pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a><u></u>><br>

Subject: Re: [Pacemaker] Corosync won't recover when a node fails<div><div class="h5"><br>

<br>

I forgot to mention, OS is Debian Wheezy 64-bit, Corosync and Pacemaker<br>

installed from packages via apt-get, and there are no local firewall rules<br>

in place:<br>

<br>

# iptables -L<br>

Chain INPUT (policy ACCEPT)<br>

target     prot opt source               destination<br>

<br>

Chain FORWARD (policy ACCEPT)<br>

target     prot opt source               destination<br>

<br>

Chain OUTPUT (policy ACCEPT)<br>

target     prot opt source               destination<br>

<br>

<br>

On Tue, Sep 24, 2013 at 11:41 AM, David Parker <<a href="mailto:dparker@utica.edu" target="_blank">dparker@utica.edu</a>> wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello,<br>

<br>

I have a 2-node cluster using Corosync and Pacemaker, where the nodes are<br>

actually to VirtualBox VMs on the same physical machine.  I have some<br>

resources set up in Pacemaker, and everything works fine if I move them in<br>

a controlled way with the "crm_resource -r <resource> --move --node <node>"<br>

command.<br>

<br>

However, when I hard-fail one of the nodes via the "poweroff" command in<br>

Virtual Box, which "pulls the plug" on the VM, the resources do not move,<br>

and I see the following output in the log on the remaining node:<br>

<br>

Sep 24 11:20:30 corosync [TOTEM ] The token was lost in the OPERATIONAL<br>

state.<br>

Sep 24 11:20:30 corosync [TOTEM ] A processor failed, forming new<br>

configuration.<br>

Sep 24 11:20:30 corosync [TOTEM ] entering GATHER state from 2.<br>

Sep 24 11:20:31 test-vm-2 lrmd: [2503]: debug: rsc:drbd_r0:0 monitor[31]<br>

(pid 8495)<br>

drbd[8495]:     2013/09/24_11:20:31 WARNING: This resource agent is<br>

deprecated and may be removed in a future release. See the man page for<br>

details. To suppress this warning, set the "ignore_deprecation" resource<br>

parameter to true.<br>

drbd[8495]:     2013/09/24_11:20:31 WARNING: This resource agent is<br>

deprecated and may be removed in a future release. See the man page for<br>

details. To suppress this warning, set the "ignore_deprecation" resource<br>

parameter to true.<br>

drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c<br>

/etc/drbd.conf role r0<br>

drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Exit code 0<br>

drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Command output:<br>

Secondary/Primary<br>

drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Calling drbdadm -c<br>

/etc/drbd.conf cstate r0<br>

drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Exit code 0<br>

drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0: Command output: Connected<br>

drbd[8495]:     2013/09/24_11:20:31 DEBUG: r0 status: Secondary/Primary<br>

Secondary Primary Connected<br>

Sep 24 11:20:31 test-vm-2 lrmd: [2503]: info: operation monitor[31] on<br>

drbd_r0:0 for client 2506: pid 8495 exited with return code 0<br>

Sep 24 11:20:32 corosync [TOTEM ] entering GATHER state from 0.<br>

Sep 24 11:20:34 corosync [TOTEM ] The consensus timeout expired.<br>

Sep 24 11:20:34 corosync [TOTEM ] entering GATHER state from 3.<br>

Sep 24 11:20:36 corosync [TOTEM ] The consensus timeout expired.<br>

Sep 24 11:20:36 corosync [TOTEM ] entering GATHER state from 3.<br>

Sep 24 11:20:38 corosync [TOTEM ] The consensus timeout expired.<br>

Sep 24 11:20:38 corosync [TOTEM ] entering GATHER state from 3.<br>

Sep 24 11:20:40 corosync [TOTEM ] The consensus timeout expired.<br>

Sep 24 11:20:40 corosync [TOTEM ] entering GATHER state from 3.<br>

Sep 24 11:20:40 corosync [TOTEM ] Totem is unable to form a cluster<br>

because of an operating system or network fault. The most common cause of<br>

this message is that the local firewall is configured improperly.<br>

Sep 24 11:20:43 corosync [TOTEM ] The consensus timeout expired.<br>

Sep 24 11:20:43 corosync [TOTEM ] entering GATHER state from 3.<br>

Sep 24 11:20:43 corosync [TOTEM ] Totem is unable to form a cluster<br>

because of an operating system or network fault. The most common cause of<br>

this message is that the local firewall is configured improperly.<br>

Sep 24 11:20:45 corosync [TOTEM ] The consensus timeout expired.<br>

Sep 24 11:20:45 corosync [TOTEM ] entering GATHER state from 3.<br>

Sep 24 11:20:45 corosync [TOTEM ] Totem is unable to form a cluster<br>

because of an operating system or network fault. The most common cause of<br>

this message is that the local firewall is configured improperly.<br>

Sep 24 11:20:47 corosync [TOTEM ] The consensus timeout expired.<br>

<br>

Those last 3 messages just repeat over and over, the cluster never<br>

recovers, and the resources never move.  "crm_mon" reports that the<br>

resources are still running on the dead node, and shows no indication that<br>

anything has gone wrong.<br>

<br>

Does anyone know what the issue could be?  My expectation was that the<br>

remaining node would become the sole member of the cluster, take over the<br>

resources, and everything would keep running.<br>

<br>

For reference, my corosync.conf file is below:<br>

<br>

compatibility: whitetank<br>

<br>

totem {<br>

        version: 2<br>

        secauth: off<br>

        interface {<br>

                member {<br>

                        memberaddr: 192.168.25.201<br>

                }<br>

                member {<br>

                        memberaddr: 192.168.25.202<br>

                 }<br>

                ringnumber: 0<br>

                bindnetaddr: 192.168.25.0<br>

                mcastport: 5405<br>

        }<br>

        transport: udpu<br>

}<br>

<br>

logging {<br>

        fileline: off<br>

        to_logfile: yes<br>

        to_syslog: yes<br>

        debug: on<br>

        logfile: /var/log/cluster/corosync.log<br>

        timestamp: on<br>

        logger_subsys {<br>

                subsys: AMF<br>

                debug: on<br>

        }<br>

}<br>

<br>

<br>

Thanks!<br>

Dave<br>

<br>

--<br>

Dave Parker<br>

Systems Administrator<br>

Utica College<br>

Integrated Information Technology Services<br>

<a href="tel:%28315%29%20792-3229" value="+13157923229" target="_blank">(315) 792-3229</a><br>

Registered Linux User #408177<br>

<br>

</blockquote>

<br>

<br>

<br>

</div></div></blockquote>

<br>_______________________________________________<br>

<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

<br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

<br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

<br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

<br>

<br>_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Dave Parker</div>Systems Administrator<br>Utica College<br>Integrated Information Technology Services<br>(315) 792-3229<br>Registered Linux User #408177

</div>