<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>Hi all,<br>I deployed a 2 nodes (physical) RHCS Pacemaker cluster on CentOS 6.5 x86_64 (fully up-to-date) with:<br><br>cman-3.0.12.1-59.el6_5.2.x86_64<br>pacemaker-1.1.10-14.el6_5.3.x86_64<br>pcs-0.9.90-2.el6.centos.3.noarch<br>qemu-kvm-0.12.1.2-2.415.el6_5.10.x86_64<br>qemu-kvm-tools-0.12.1.2-2.415.el6_5.10.x86_64<br>drbd-utils-8.9.0-1.el6.x86_64<br>drbd-udev-8.9.0-1.el6.x86_64<br>drbd-rgmanager-8.9.0-1.el6.x86_64<br>drbd-bash-completion-8.9.0-1.el6.x86_64<br>drbd-pacemaker-8.9.0-1.el6.x86_64<br>drbd-8.9.0-1.el6.x86_64<br>drbd-km-2.6.32_431.20.3.el6.x86_64-8.4.5-1.x86_64<br>kernel-2.6.32-431.20.3.el6.x86_64<br><br>The aim is to run KVM virtual machines backed by DRBD (8.4.5) in an active/passive mode (no dual primary and so no live migration).<br>Just to err on the side of consistency against HA (and to pave the way for a possible dual-primary live-migration-capable setup), I configured DRBD for resource-and-stonith with rhcs_fence (that's why I installed drbd-rgmanager) as fence-peer handler and stonith devices configured in Pacemaker (pcmk-redirect in cluster.conf).<br><br>The setup "almost" works (all seems ok with: "pcs status", "crm_mon -Arf1", "corosync-cfgtool -s", "corosync-objctl | grep member") , but every time it needs a resource promotion (to Master, i.e. becoming primary) it either fails or fences the other node (the one supposed to become Slave i.e. secondary) and only then succeeds.<br>It happens, for example both on initial resource definition (when attempting first start) and on node entering standby (when trying to automatically move the resources by stopping then starting them).<br><br>I collected a full "pcs cluster report" and I can provide a CIB dump, but I will initially paste here an excerpt from my configuration just in case it happens to be a simple configuration error that someone can spot on the fly ;> (hoping...)<br><br>Keep in mind that the setup has separated redundant network connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back) and that FQDNs are correctly resolved through /etc/hosts<br><br>DRBD:<br><br>/etc/drbd.d/global_common.conf:<br><br>------------------------------------------------------------------------------------------------------<br><br>global {<br>    usage-count no;<br>}<br><br>common {<br>    protocol C;<br>    disk {<br>        on-io-error        detach;<br>        fencing            resource-and-stonith;<br>        disk-barrier        no;<br>        disk-flushes        no;<br>        al-extents        3389;<br>        c-plan-ahead        200;<br>        c-fill-target        15M;<br>        c-max-rate        100M;<br>        c-min-rate        10M;<br>    }<br>    net {<br>        after-sb-0pri        discard-zero-changes;<br>        after-sb-1pri        discard-secondary;<br>        after-sb-2pri        disconnect;<br>        csums-alg        sha1;<br>        data-integrity-alg    sha1;<br>        max-buffers        8000;<br>        max-epoch-size        8000;<br>        unplug-watermark    16;<br>        sndbuf-size        0;<br>        verify-alg        sha1;<br>    }<br>    startup {<br>        wfc-timeout        300;<br>        outdated-wfc-timeout    80;<br>        degr-wfc-timeout    120;<br>    }<br>    handlers {<br>        fence-peer        "/usr/lib/drbd/rhcs_fence";<br>    }<br>}<br><br>------------------------------------------------------------------------------------------------------<br><br>Sample DRBD resource (there are others, similar)<br>/etc/drbd.d/dc_vm.res:<br><br>------------------------------------------------------------------------------------------------------<br><br>resource dc_vm {<br>device          /dev/drbd1;<br>disk            /dev/VolGroup00/dc_vm;<br>meta-disk       internal;<br>on cluster1.verolengo.privatelan {<br>address ipv4 172.16.200.1:7790;<br>}<br>on cluster2.verolengo.privatelan {<br>address ipv4 172.16.200.2:7790;<br>}<br>}<br><br>------------------------------------------------------------------------------------------------------<br><br>RHCS:<br><br>/etc/cluster/cluster.conf<br><br>------------------------------------------------------------------------------------------------------<br><br><?xml version="1.0"?><br><cluster name="vclu" config_version="14"><br>  <cman two_node="1" expected_votes="1" keyfile="/etc/corosync/authkey" transport="udpu" port="5405"/><br>  <totem consensus="60000" join="6000" token="100000" token_retransmits_before_loss_const="20" rrp_mode="passive" secauth="on"/><br>  <clusternodes><br>    <clusternode name="cluster1.verolengo.privatelan" votes="1" nodeid="1"><br>      <altname name="clusterlan1.verolengo.privatelan" port="6405"/><br>      <fence><br>        <method name="pcmk-redirect"><br>          <device name="pcmk" port="cluster1.verolengo.privatelan"/><br>        </method><br>      </fence><br>    </clusternode><br>    <clusternode name="cluster2.verolengo.privatelan" votes="1" nodeid="2"><br>      <altname name="clusterlan2.verolengo.privatelan" port="6405"/><br>      <fence><br>        <method name="pcmk-redirect"><br>          <device name="pcmk" port="cluster2.verolengo.privatelan"/><br>        </method><br>      </fence><br>    </clusternode><br>  </clusternodes><br>  <fencedevices><br>    <fencedevice name="pcmk" agent="fence_pcmk"/><br>  </fencedevices><br>  <fence_daemon clean_start="0" post_fail_delay="30" post_join_delay="30"/><br>  <logging debug="on"/><br>  <rm disabled="1"><br>    <failoverdomains/><br>    <resources/><br>  </rm><br></cluster><br><br>------------------------------------------------------------------------------------------------------<br><br>Pacemaker:<br><br>PROPERTIES:<br><br>pcs property set default-resource-stickiness=100<br>pcs property set no-quorum-policy=ignore<br><br>STONITH:<br><br>pcs stonith create ilocluster1 fence_ilo2 action="off" delay="10" \<br>    ipaddr="ilocluster1.verolengo.privatelan" login="cluster2" passwd="test" power_wait="4" \<br>    pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan" op monitor interval=60s<br>pcs stonith create ilocluster2 fence_ilo2 action="off" \<br>    ipaddr="ilocluster2.verolengo.privatelan" login="cluster1" passwd="test" power_wait="4" \<br>    pcmk_host_check="static-list" pcmk_host_list="cluster2.verolengo.privatelan" op monitor interval=60s<br>pcs stonith create pdu1 fence_apc action="off" \<br>    ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \<br> pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7" \<br>    pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan" op monitor interval=60s<br><br>pcs stonith level add 1 cluster1.verolengo.privatelan ilocluster1<br>pcs stonith level add 2 cluster1.verolengo.privatelan pdu1<br>pcs stonith level add 1 cluster2.verolengo.privatelan ilocluster2<br>pcs stonith level add 2 cluster2.verolengo.privatelan pdu1<br><br>pcs property set stonith-enabled=true<br>pcs property set stonith-action=off<br><br>SAMPLE RESOURCE:<br><br>pcs cluster cib dc_cfg<br>pcs -f dc_cfg resource create DCVMDisk ocf:linbit:drbd \<br>    drbd_resource=dc_vm op monitor interval="31s" role="Master" \<br>    op monitor interval="29s" role="Slave" \<br>    op start interval="0" timeout="120s" \<br>    op stop interval="0" timeout="180s"<br>pcs -f dc_cfg resource master DCVMDiskClone DCVMDisk \<br>    master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \<br>    notify=true target-role=Started is-managed=true<br>pcs -f dc_cfg resource create DCVM ocf:heartbeat:VirtualDomain \<br>    config=/etc/libvirt/qemu/dc.xml migration_transport=tcp migration_network_suffix=-10g \<br>    hypervisor=qemu:///system meta allow-migrate=false target-role=Started is-managed=true \<br>    op start interval="0" timeout="120s" \<br>    op stop interval="0" timeout="120s" \<br>    op monitor interval="60s" timeout="120s"<br>pcs -f dc_cfg constraint colocation add DCVM DCVMDiskClone INFINITY with-rsc-role=Master<br>pcs -f dc_cfg constraint order promote DCVMDiskClone then start DCVM<br>pcs -f dc_cfg constraint location DCVM prefers cluster2.verolengo.privatelan=50<br>pcs cluster cib-push firewall_cfg<br><br>Since I know that pcs still has some rough edges, I installed crmsh too, but never actually used it.<br><br>Many thanks in advance for your attention.<br><br>Kind regards,<br>Giuseppe Ragusa<br><br>                                       </div></body>
</html>