<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Shutdown pacemaker and fix your drbd disk first. Get them both
    uptodate/uptodate and make sure you can manually switch them to
    primary on each node.<br>
    <br>
    Node2 can't become primary when it's not connected to something with
    an uptodate disk.<br>
    <br>
    On 3/24/12 3:15 PM, Andrew Martin wrote:
    <blockquote cite="mid:9a8f56d9-7a7f-4ed4-ad31-6dded021a697@zimbra"
      type="cite">
      <style type="text/css">p { margin: 0; }</style>
      <div style="font-family: Times New Roman; font-size: 12pt; color:
        #000000"><font size="3">Hi Andreas,</font>
        <div style="color: rgb(0, 0, 0); font-family: 'Times New Roman';
          font-size: 12pt; "><br>
        </div>
        <div><font size="3">My complete cluster configuration is as
            follows:</font><br>
          <div>============</div>
          <div>Last updated: Sat Mar 24 13:51:55 2012</div>
          <div>Last change: Sat Mar 24 13:41:55 2012</div>
          <div>Stack: Heartbeat</div>
          <div>Current DC: node2 (9100538b-7a1f-41fd-9c1a-c6b4b1c32b18)
            - partition with quorum</div>
          <div>Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c</div>
          <div>3 Nodes configured, unknown expected votes</div>
          <div>19 Resources configured.</div>
          <div>============</div>
          <div><br>
          </div>
          <div>Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
            OFFLINE (standby)</div>
          <div>Online: [ node2 node1 ]</div>
          <div><br>
          </div>
          <div> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]</div>
          <div>     Masters: [ node2 ]</div>
          <div>     Slaves: [ node1 ]</div>
          <div> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]</div>
          <div>     Masters: [ node2 ]</div>
          <div>     Slaves: [ node1 ]</div>
          <div> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]</div>
          <div>     Masters: [ node2 ]</div>
          <div>     Slaves: [ node1 ]</div>
          <div> Resource Group: g_vm</div>
          <div>     p_fs_vmstore<span class="Apple-tab-span"
              style="white-space:pre"> </span>(ocf::heartbeat:Filesystem):<span
              class="Apple-tab-span" style="white-space:pre"> </span>Started
            node2</div>
          <div>     p_vm<span class="Apple-tab-span"
              style="white-space:pre"> </span>(ocf::heartbeat:VirtualDomain):<span
              class="Apple-tab-span" style="white-space:pre"> </span>Started
            node2</div>
          <div> Clone Set: cl_daemons [g_daemons]</div>
          <div>     Started: [ node2 node1 ]</div>
          <div>     Stopped: [ g_daemons:2 ]</div>
          <div> Clone Set: cl_sysadmin_notify [p_sysadmin_notify]</div>
          <div>     Started: [ node2 node1 ]</div>
          <div>     Stopped: [ p_sysadmin_notify:2 ]</div>
          <div> stonith-node1<span class="Apple-tab-span"
              style="white-space:pre"> </span>(stonith:external/tripplitepdu):<span
              class="Apple-tab-span" style="white-space:pre"> </span>Started
            node2</div>
          <div> stonith-node2<span class="Apple-tab-span"
              style="white-space:pre"> </span>(stonith:external/tripplitepdu):<span
              class="Apple-tab-span" style="white-space:pre"> </span>Started
            node1</div>
          <div> Clone Set: cl_ping [p_ping]</div>
          <div>     Started: [ node2 node1 ]</div>
          <div>     Stopped: [ p_ping:2 ]</div>
          <div><br>
          </div>
          <div>
            <div>node $id="6553a515-273e-42fe-ab9e-00f74bd582c3" node1 \</div>
            <div>        attributes standby="off"</div>
            <div>node $id="9100538b-7a1f-41fd-9c1a-c6b4b1c32b18" node2 \</div>
            <div>        attributes standby="off"</div>
            <div>node $id="c4bf25d7-a6b7-4863-984d-aafd937c0da4"
              quorumnode \</div>
            <div>        attributes standby="on"</div>
            <div>primitive p_drbd_mount2 ocf:linbit:drbd \</div>
            <div>        params drbd_resource="mount2" \</div>
            <div>        op monitor interval="15" role="Master" \</div>
            <div>        op monitor interval="30" role="Slave"</div>
            <div>primitive p_drbd_mount1 ocf:linbit:drbd \</div>
            <div>        params drbd_resource="mount1" \</div>
            <div>        op monitor interval="15" role="Master" \</div>
            <div>        op monitor interval="30" role="Slave"</div>
            <div>primitive p_drbd_vmstore ocf:linbit:drbd \</div>
            <div>        params drbd_resource="vmstore" \</div>
            <div>        op monitor interval="15" role="Master" \</div>
            <div>        op monitor interval="30" role="Slave"</div>
            <div>primitive p_fs_vmstore ocf:heartbeat:Filesystem \</div>
            <div>        params device="/dev/drbd0" directory="/vmstore"
              fstype="ext4" \</div>
            <div>        op start interval="0" timeout="60s" \</div>
            <div>        op stop interval="0" timeout="60s" \</div>
            <div>        op monitor interval="20s" timeout="40s"</div>
            <div>primitive p_libvirt-bin upstart:libvirt-bin \</div>
            <div>        op monitor interval="30"</div>
            <div>primitive p_ping ocf:pacemaker:ping \</div>
            <div>        params name="p_ping" host_list="192.168.1.10
              192.168.1.11" multiplier="1000" \</div>
            <div>        op monitor interval="20s"</div>
            <div>primitive p_sysadmin_notify ocf:heartbeat:MailTo \</div>
            <div>        params email=<a class="moz-txt-link-rfc2396E" href="mailto:me@example.com">"me@example.com"</a> \</div>
            <div>        params subject="Pacemaker Change" \</div>
            <div>        op start interval="0" timeout="10" \</div>
            <div>        op stop interval="0" timeout="10" \</div>
            <div>        op monitor interval="10" timeout="10"</div>
            <div>primitive p_vm ocf:heartbeat:VirtualDomain \</div>
            <div>        params config="/vmstore/config/vm.xml" \</div>
            <div>        meta allow-migrate="false" \</div>
            <div>        op start interval="0" timeout="120s" \</div>
            <div>        op stop interval="0" timeout="120s" \</div>
            <div>        op monitor interval="10" timeout="30"</div>
            <div>primitive stonith-node1 stonith:external/tripplitepdu \</div>
            <div>        params pdu_ipaddr="192.168.1.12" pdu_port="1"
              pdu_username="xxx" pdu_password="xxx"
              hostname_to_stonith="node1"</div>
            <div>primitive stonith-node2 stonith:external/tripplitepdu \</div>
            <div>        params pdu_ipaddr="192.168.1.12" pdu_port="2"
              pdu_username="xxx" pdu_password="xxx"
              hostname_to_stonith="node2"</div>
            <div>group g_daemons p_libvirt-bin</div>
            <div>group g_vm p_fs_vmstore p_vm</div>
            <div>ms ms_drbd_mount2 p_drbd_mount2 \</div>
            <div>        meta master-max="1" master-node-max="1"
              clone-max="2" clone-node-max="1" notify="true"</div>
            <div>ms ms_drbd_mount1 p_drbd_mount1 \</div>
            <div>        meta master-max="1" master-node-max="1"
              clone-max="2" clone-node-max="1" notify="true"</div>
            <div>ms ms_drbd_vmstore p_drbd_vmstore \</div>
            <div>        meta master-max="1" master-node-max="1"
              clone-max="2" clone-node-max="1" notify="true"</div>
            <div>clone cl_daemons g_daemons</div>
            <div>clone cl_ping p_ping \</div>
            <div>        meta interleave="true"</div>
            <div>clone cl_sysadmin_notify p_sysadmin_notify</div>
            <div>location l-st-node1 stonith-node1 -inf: node1</div>
            <div>location l-st-node2 stonith-node2 -inf: node2</div>
            <div>location l_run_on_most_connected p_vm \</div>
            <div>        rule $id="l_run_on_most_connected-rule" p_ping:
              defined p_ping</div>
            <div>colocation c_drbd_libvirt_vm inf:
              ms_drbd_vmstore:Master ms_drbd_mount1:Master
              ms_drbd_mount2:Master g_vm</div>
            <div>order o_drbd-fs-vm inf: ms_drbd_vmstore:promote
              ms_drbd_mount1:promote ms_drbd_mount2:promote
              cl_daemons:start g_vm:start</div>
            <div>property $id="cib-bootstrap-options" \</div>
            <div>       
              dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c"
              \</div>
            <div>        cluster-infrastructure="Heartbeat" \</div>
            <div>        stonith-enabled="false" \</div>
            <div>        no-quorum-policy="stop" \</div>
            <div>        last-lrm-refresh="1332539900" \</div>
            <div>        cluster-recheck-interval="5m" \</div>
            <div>        crmd-integration-timeout="3m" \</div>
            <div>        shutdown-escalation="5m"</div>
          </div>
          <div><br>
          </div>
          <div>The STONITH plugin is a custom plugin I wrote for the
            Tripp-Lite PDUMH20ATNET that I'm using as the STONITH
            device:</div>
          <div><a moz-do-not-send="true"
              href="http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf">http://www.tripplite.com/shared/product-pages/en/PDUMH20ATNET.pdf</a></div>
          <div><br>
          </div>
          <div>As you can see, I left the DRBD service to be started by
            the operating system (as an lsb script at boot time) however
            Pacemaker controls actually bringing up/taking down the
            individual DRBD devices. The behavior I observe is as
            follows: I issue "crm resource migrate p_vm" on node1 and
            failover successfully to node2. During this time, node2
            fences node1's DRBD devices (using dopd) and marks them as
            Outdated. Meanwhile node2's DRBD devices are UpToDate. I
            then shutdown both nodes and then bring them back up. They
            reconnect to the cluster (with quorum), and node1's DRBD
            devices are still Outdated as expected and node2's DRBD
            devices are still UpToDate, as expected. At this point, DRBD
            starts on both nodes, however node2 will not set DRBD as
            master:</div>
          <div>
            <div>Node quorumnode (c4bf25d7-a6b7-4863-984d-aafd937c0da4):
              OFFLINE (standby)</div>
            <div>Online: [ node2 node1 ]</div>
            <div><br>
            </div>
            <div> Master/Slave Set: ms_drbd_vmstore [p_drbd_vmstore]</div>
            <div>     Slaves: [ node1 node2 ]</div>
            <div> Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]</div>
            <div>     Slaves: [ node1 node 2 ]</div>
            <div> Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]</div>
            <div>     Slaves: [ node1 node2 ]</div>
          </div>
          <div><br>
          </div>
          <div>I am having trouble sorting through the logging
            information because there is so much of it in
            /var/log/daemon.log, but I can't  find an error message
            printed about why it will not promote node2. At this point
            the DRBD devices are as follows:</div>
          <div>node2: cstate = WFConnection dstate=UpToDate</div>
          <div>node1: cstate = StandAlone dstate=Outdated</div>
          <div><br>
          </div>
          <div>I don't see any reason why node2 can't become DRBD
            master, or am I missing something? If I do "drbdadm connect
            all" on node1, then the cstate on both nodes changes to
            "Connected" and node2 immediately promotes the DRBD
            resources to master. Any ideas on why I'm observing this
            incorrect behavior?</div>
          <div><br>
          </div>
          <div>Any tips on how I can better filter through the
            pacemaker/heartbeat logs or how to get additional useful
            debug information?</div>
          <div><br>
          </div>
          <div>Thanks,</div>
          <div><br>
          </div>
          <div>Andrew</div>
          <br>
          <hr id="zwchr" style="color: rgb(0, 0, 0); font-family: 'Times
            New Roman'; font-size: 12pt; ">
          <div style="color: rgb(0, 0, 0); font-weight: normal;
            font-style: normal; text-decoration: none; font-family:
            Helvetica, Arial, sans-serif; font-size: 12pt; "><b>From: </b>"Andreas
            Kurz" <a class="moz-txt-link-rfc2396E" href="mailto:andreas@hastexo.com"><andreas@hastexo.com></a><br>
            <b>To: </b><a class="moz-txt-link-abbreviated" href="mailto:pacemaker@oss.clusterlabs.org">pacemaker@oss.clusterlabs.org</a><br>
            <b>Sent: </b>Wednesday, 1 February, 2012 4:19:25 PM<br>
            <b>Subject: </b>Re: [Pacemaker] Nodes will not promote DRBD
            resources to master on failover<br>
            <br>
            On 01/25/2012 08:58 PM, Andrew Martin wrote:<br>
            > Hello,<br>
            > <br>
            > Recently I finished configuring a two-node cluster with
            pacemaker 1.1.6<br>
            > and heartbeat 3.0.5 on nodes running Ubuntu 10.04. This
            cluster includes<br>
            > the following resources:<br>
            > - primitives for DRBD storage devices<br>
            > - primitives for mounting the filesystem on the DRBD
            storage<br>
            > - primitives for some mount binds<br>
            > - primitive for starting apache<br>
            > - primitives for starting samba and nfs servers
            (following instructions<br>
            > here
            <a class="moz-txt-link-rfc2396E" href="http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf"><http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf></a>)<br>
            > - primitives for exporting nfs shares
            (ocf:heartbeat:exportfs)<br>
            <br>
            not enough information ... please share at least your
            complete cluster<br>
            configuration<br>
            <br>
            Regards,<br>
            Andreas<br>
            <br>
            -- <br>
            Need help with Pacemaker?<br>
            <a class="moz-txt-link-freetext" href="http://www.hastexo.com/now">http://www.hastexo.com/now</a><br>
            <br>
            > <br>
            > Perhaps this is best described through the output of
            crm_mon:<br>
            > Online: [ node1 node2 ]<br>
            > <br>
            >  Master/Slave Set: ms_drbd_mount1 [p_drbd_mount1]
            (unmanaged)<br>
            >      p_drbd_mount1:0     (ocf::linbit:drbd):    
            Started node2 (unmanaged)<br>
            >      p_drbd_mount1:1     (ocf::linbit:drbd):    
            Started node1<br>
            > (unmanaged) FAILED<br>
            >  Master/Slave Set: ms_drbd_mount2 [p_drbd_mount2]<br>
            >      p_drbd_mount2:0       (ocf::linbit:drbd):    
            Master node1<br>
            > (unmanaged) FAILED<br>
            >      Slaves: [ node2 ]<br>
            >  Resource Group: g_core<br>
            >      p_fs_mount1 (ocf::heartbeat:Filesystem):  
             Started node1<br>
            >      p_fs_mount2   (ocf::heartbeat:Filesystem):  
             Started node1<br>
            >      p_ip_nfs   (ocf::heartbeat:IPaddr2):       Started
            node1<br>
            >  Resource Group: g_apache<br>
            >      p_fs_mountbind1    (ocf::heartbeat:Filesystem):  
             Started node1<br>
            >      p_fs_mountbind2    (ocf::heartbeat:Filesystem):  
             Started node1<br>
            >      p_fs_mountbind3    (ocf::heartbeat:Filesystem):  
             Started node1<br>
            >      p_fs_varwww        (ocf::heartbeat:Filesystem):  
             Started node1<br>
            >      p_apache   (ocf::heartbeat:apache):        Started
            node1<br>
            >  Resource Group: g_fileservers<br>
            >      p_lsb_smb  (lsb:smbd):     Started node1<br>
            >      p_lsb_nmb  (lsb:nmbd):     Started node1<br>
            >      p_lsb_nfsserver    (lsb:nfs-kernel-server):      
             Started node1<br>
            >      p_exportfs_mount1   (ocf::heartbeat:exportfs):    
             Started node1<br>
            >      p_exportfs_mount2     (ocf::heartbeat:exportfs):  
               Started node1<br>
            > <br>
            > I have read through the Pacemaker Explained<br>
            >
<a class="moz-txt-link-rfc2396E" href="http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained"><http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained></a><br>
            > documentation, however could not find a way to further
            debug these<br>
            > problems. First, I put node1 into standby mode to
            attempt failover to<br>
            > the other node (node2). Node2 appeared to start the
            transition to<br>
            > master, however it failed to promote the DRBD resources
            to master (the<br>
            > first step). I have attached a copy of this session in
            commands.log and<br>
            > additional excerpts from /var/log/syslog during
            important steps. I have<br>
            > attempted everything I can think of to try and start
            the DRBD resource<br>
            > (e.g. start/stop/promote/manage/cleanup under crm
            resource, restarting<br>
            > heartbeat) but cannot bring it out of the slave state.
            However, if I set<br>
            > it to unmanaged and then run drbdadm primary all in the
            terminal,<br>
            > pacemaker is satisfied and continues starting the rest
            of the resources.<br>
            > It then failed when attempting to mount the filesystem
            for mount2, the<br>
            > p_fs_mount2 resource. I attempted to mount the
            filesystem myself and was<br>
            > successful. I then unmounted it and ran cleanup on
            p_fs_mount2 and then<br>
            > it mounted. The rest of the resources started as
            expected until the<br>
            > p_exportfs_mount2 resource, which failed as follows:<br>
            > p_exportfs_mount2     (ocf::heartbeat:exportfs):    
             started node2<br>
            > (unmanaged) FAILED<br>
            > <br>
            > I ran cleanup on this and it started, however when
            running this test<br>
            > earlier today no command could successfully start this
            exportfs resource. <br>
            > <br>
            > How can I configure pacemaker to better resolve these
            problems and be<br>
            > able to bring the node up successfully on its own? What
            can I check to<br>
            > determine why these failures are occuring?
            /var/log/syslog did not seem<br>
            > to contain very much useful information regarding why
            the failures occurred.<br>
            > <br>
            > Thanks,<br>
            > <br>
            > Andrew<br>
            > <br>
            > <br>
            > <br>
            > <br>
            > This body part will be downloaded on demand.<br>
            <br>
            <br>
            <br>
            <br>
            <br>
            _______________________________________________<br>
            Pacemaker mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
            <a class="moz-txt-link-freetext" href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
            <br>
            Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a><br>
            Getting started:
            <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
            Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a><br>
          </div>
          <br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Pacemaker mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a>
<a class="moz-txt-link-freetext" href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>
Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>
Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>
</pre>
    </blockquote>
  </body>
</html>