Hi Michel,<br>Yes, I have try with a simpler configuration, I follow this steps:<br>1) Master/Slave ocf:linbit:drbd RA + ocf:heartbeat:Filesystem RA-> shutdown -r now -> Ok, no split brain<br>2) ..+ ocf:heartbeat:IPaddr2 RA -> shutdown -r now -> Ok<br>
3) ..+ heartbeat:drbdlinks RA -> shutdown -r now -> Ok<br>4) ..+ ocf:heartbeat:pgsql RA -> shutdown -r now -> Ok<br>5) ..+ ocf:custom:Asterisk RA -> shutdown -r now -> Ok<br>6) ..+ ocf:heartbeat:apache RA -> shutdown -r now -> Ok<br>
7) ..+ lsb:postfix RA -> shutdown -r now -> Ok<br>8) ..+ lsb:dhcp3-server -> shutdown -r now -> Ok<br>9) ..+ lsb:lsb:atftpd -> shutdown -r now -> FAIL, Split brain<br>At this point I get the first split brain, after a lot of google search I finally add a start-delay of ten seconds to Start and Promote operations for drbd RA. After that I reboot a couple times and everything works fine, no more split brain.<br>
10) ..+ ocf:custom:JBoss RA -> shutdown -r now -> FAIL, Split brain<br>With this resource enabled I always get a split brain after "normal" reboot. I tryed to<br>increase start-delay time on both.start and promote operation to 40 seconds, that time is more than required to stop JBoss.<br>
If I remove start-delay , I can see in secondary logs:<br><br>Dec 21 23:38:02 secondary drbd[17758]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf primary r0<br>Dec 21 23:38:03 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 0: State change failed: (-1) Multiple primaries not allowed by config<br>
Dec 21 23:38:03 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) Command 'drbdsetup 0 primary' terminated with exit code 11<br>Dec 21 23:38:03 secondary drbd[17758]: ERROR: r0: Called drbdadm -c /etc/drbd.conf primary r0<br>
Dec 21 23:38:03 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 2009/12/21_23:38:03 ERROR: r0: Called drbdadm -c /etc/drbd.conf primary r0<br>Dec 21 23:38:03 secondary drbd[17758]: ERROR: r0: Exit code 11<br>
Dec 21 23:38:03 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 2009/12/21_23:38:03 ERROR: r0: Exit code 11<br>Dec 21 23:38:03 secondary drbd[17758]: ERROR: r0: Command output:<br>Dec 21 23:38:03 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 2009/12/21_23:38:03 ERROR: r0: Command output:<br>
Dec 21 23:38:03 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stdout)<br>Dec 21 23:38:03 secondary drbd[17758]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf primary r0<br>Dec 21 23:38:04 secondary kernel: [215495.004740] tg3: eth1: Link is down.<br>
Dec 21 23:38:04 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 0: State change failed: (-1) Multiple primaries not allowed by config<br>Dec 21 23:38:04 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) Command 'drbdsetup 0 primary' terminated with exit code 11<br>
Dec 21 23:38:04 secondary drbd[17758]: ERROR: r0: Called drbdadm -c /etc/drbd.conf primary r0<br>Dec 21 23:38:04 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 2009/12/21_23:38:04 ERROR: r0: Called drbdadm -c /etc/drbd.conf primary r0<br>
Dec 21 23:38:04 secondary drbd[17758]: ERROR: r0: Exit code 11<br>Dec 21 23:38:04 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 2009/12/21_23:38:04 ERROR: r0: Exit code 11<br>Dec 21 23:38:04 secondary drbd[17758]: ERROR: r0: Command output:<br>
Dec 21 23:38:04 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stderr) 2009/12/21_23:38:04 ERROR: r0: Command output:<br>Dec 21 23:38:04 secondary lrmd: [19818]: info: RA output: (drbd:0:promote:stdout)<br>Dec 21 23:38:05 secondary drbd[17758]: DEBUG: r0: Calling drbdadm -c /etc/drbd.conf primary r0<br>
Dec 21 23:38:06 secondary kernel: [215496.595684] block drbd0: PingAck did not arrive in time.<br>Dec 21 23:38:06 secondary kernel: [215496.602119] tg3: eth1: Link is up at 100 Mbps, full duplex.<br>Dec 21 23:38:06 secondary kernel: [215496.602122] tg3: eth1: Flow control is off for TX and off for RX.<br>
Dec 21 23:38:06 secondary kernel: [215496.638538] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )<br>Dec 21 23:38:06 secondary kernel: [215496.638547] block drbd0: asender terminated<br>
Dec 21 23:38:06 secondary kernel: [215496.638550] block drbd0: Terminating asender thread<br>Dec 21 23:38:06 secondary kernel: [215496.638589] block drbd0: short read expecting header on sock: r=-512<br>Dec 21 23:38:06 secondary kernel: [215496.697734] block drbd0: Connection closed<br>
Dec 21 23:38:06 secondary kernel: [215496.697734] block drbd0: conn( NetworkFailure -> Unconnected )<br>Dec 21 23:38:06 secondary kernel: [215496.697734] block drbd0: receiver terminated<br>Dec 21 23:38:06 secondary kernel: [215496.697734] block drbd0: Restarting receiver thread<br>
Dec 21 23:38:06 secondary kernel: [215496.697734] block drbd0: receiver (re)started<br>Dec 21 23:38:06 secondary kernel: [215496.697734] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0<br>Dec 21 23:38:07 secondary crm-fence-peer.sh[17839]: invoked for r0<br>
<br>Means that I'm having communication failure due to network goes down too fast or that secondary wants to be Master before primary can be slave?...or both? :-)<br><br>Thanks for help!!<br>Andres<br><br>

<br><br><div class="gmail_quote">2009/12/21 <a href="mailto:andschais@gmail.com" target="_blank">andschais@gmail.com</a> <span dir="ltr"><<a href="mailto:andschais@gmail.com" target="_blank">andschais@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi all, <br><br>I'm getting troubles with a Pacemaker+DRBD 2 nodes cluster. I am trying to solve it for about a week, I really need help!!! <br>If I disconnect power cord the failover works great, resources migrate to secondary node and back to primary when I turn it on.<br>


But when turn off primary node with a "shutdown -r now" command, I always finish with a split brian. <span><span style="background-color: rgb(255, 255, 255);" title="pero esto no es todo">That's not all</span></span>, If a put just a few resources (for example: virtual IP, DRBD, Apache and PostgreSQL) split brain does not take place, but at the moment I put 8 or 9 resources (specially when one of those resources is JBoss AS) I always get split brain...<br>


<span><span style="background-color: rgb(255, 255, 255);" title="alguien puede darme alguna pista?">Can someone give me some hints?</span></span><br><br>My systems are:<br><br>OS: Debian Lenny 2.6.26-2-686<br>
Corosync 1.1.2<br>DRBD 8.3.6<br><br>And my configuration files are:<br><br>/etc/corosync/corosync.conf<br><br># Please read the openais.conf.5 manual page<br>totem {<br>        version: 2<br>        # How long before declaring a token lost (ms)<br>


        token: 3000<br>        # How many token retransmits before forming a new configuration<br>        token_retransmits_before_loss_const: 10<br>        # How long to wait for join messages in the membership protocol (ms)<br>


        join: 60<br>        # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)<br>        consensus: 1500<br>        # Turn off the virtual synchrony filter<br>        vsftype: none<br>


        # Number of messages that may be sent by one processor on receipt of the token<br>        max_messages: 20<br>        # Limit generated nodeids to 31-bits (positive signed integers)<br>        clear_node_high_bit: yes<br>


        # Disable encryption<br>        secauth: on<br>        # How many threads to use for encryption/decryption<br>        threads: 0<br>        # Optionally assign a fixed node id (integer)<br>        # nodeid: 1234<br>


        # This specifies the mode of redundant ring, which may be none, active, or passive.<br>        rrp_mode: passive<br>        interface {<br>                # The following values need to be set based on your environment<br>


                ringnumber: 0<br>                bindnetaddr: 172.16.1.0<br>                mcastaddr: 226.94.1.1<br>                mcastport: 5405<br>        }<br>        interface {<br>                # The following values need to be set based on your environment<br>


                ringnumber: 1<br>                bindnetaddr: 10.186.68.0<br>                mcastaddr: 226.94.2.1<br>                mcastport: 5405<br>        }<br>}<br>amf {<br>        mode: disabled<br>}<br>service {<br>


        # Load the Pacemaker Cluster Resource Manager<br>        ver:       0<br>        name:      pacemaker<br>}<br>aisexec {<br>        user:   root<br>        group:  root<br>}<br>logging {<br>    to_stderr: yes<br>    debug: on<br>


    timestamp: on<br>    to_file: yes<br>    logfile: /var/log/corosync.log<br>    to_syslog: no<br>    syslog_facility: daemon<br>}<br>}<br><br><br>/etc/drbd.conf<br><br>global {<br>    usage-count yes;<br>}<br>common {<br>


    syncer { rate 33M; }<br>}<br>resource r0 {<br>    protocol C;<br>    handlers {<br>       pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";<br>


       pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";<br>       local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";<br>


       fence-peer "/usr/lib/drbd/crm-fence-peer.sh";<br>       after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";<br>       outdate-peer "/usr/lib/drbd/outdate-peer.sh";<br>       split-brain "/usr/lib/drbd/notify-split-brain.sh root@localhost";<br>


    }<br>    startup {<br>        degr-wfc-timeout 30;<br>        wfc-timeout 30;<br>    }<br>    disk {<br>        fencing resource-only;<br>        on-io-error   detach;<br>    }<br>    net {<br>        after-sb-0pri disconnect;<br>


        after-sb-1pri disconnect;<br>        after-sb-2pri disconnect;<br>        rr-conflict disconnect;<br>    }<br><br>    on primary {<br>        device     /dev/drbd0;<br>        disk       /dev/vg00/drbd;<br>        address    <a href="http://172.16.1.1:7788" target="_blank">172.16.1.1:7788</a>;<br>


        meta-disk  internal;<br>    }<br>    on secondary {<br>        device     /dev/drbd0;<br>        disk       /dev/vg00/drbd;<br>        address    <a href="http://172.16.1.2:7788" target="_blank">172.16.1.2:7788</a>;<br>

        meta-disk  internal;<br>
    }<br>}<br><br><br>and my crm config<br><br><configuration><br>    <crm_config><br>      <cluster_property_set id="cib-bootstrap-options"><br>        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/><br>


        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/><br>        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/><br>


        <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1261424411"/><br>        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.0.6-cebe2b6ff49b36b29a3bd7ada1c4701c7470febe"/><br>


        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/><br>      </cluster_property_set><br>    </crm_config><br>    <nodes><br>


      <node uname="primary" type="normal" id="primary"><br>        <instance_attributes id="nodes-primary"><br>          <nvpair name="standby" id="nodes-primary-standby" value="off"/><br>


        </instance_attributes><br>      </node><br>      <node uname="secondary" type="normal" id="secondary"><br>        <instance_attributes id="nodes-secondary"><br>


          <nvpair name="standby" id="nodes-secondary-standby" value="off"/><br>        </instance_attributes><br>      </node><br>    </nodes><br>    <resources><br>


      <master id="ms-drbd"><br>        <meta_attributes id="ms-drbd-meta_attributes"><br>          <nvpair id="ms-drbd-meta_attributes-master-max" name="master-max" value="1"/><br>


          <nvpair id="ms-drbd-meta_attributes-master-node-max" name="master-node-max" value="1"/><br>          <nvpair id="ms-drbd-meta_attributes-clone-max" name="clone-max" value="2"/><br>


          <nvpair id="ms-drbd-meta_attributes-clone-node-max" name="clone-node-max" value="1"/><br>          <nvpair id="ms-drbd-meta_attributes-notify" name="notify" value="true"/><br>


          <nvpair id="ms-drbd-meta_attributes-globally-unique" name="globally-unique" value="false"/><br>          <nvpair name="target-role" id="ms-drbd-meta_attributes-target-role" value="Started"/><br>


        </meta_attributes><br>        <primitive class="ocf" id="drbd" provider="linbit" type="drbd"><br>          <instance_attributes id="drbd-instance_attributes"><br>


            <nvpair id="drbd-instance_attributes-drbd_resource" name="drbd_resource" value="r0"/><br>          </instance_attributes><br>          <operations><br>            <op id="drbd-monitor-59s" interval="59s" name="monitor" role="Master" timeout="30s"/><br>


            <op id="drbd-monitor-60s" interval="60s" name="monitor" role="Slave" timeout="30s"/><br>            <op id="drbd-start-0s" interval="0s" name="start" start-delay="10s"/><br>


            <op id="drbd-promote-0s" interval="0s" name="promote" start-delay="10s"/><br>          </operations><br>        </primitive><br>      </master><br>


      <group id="p-group"><br>        <primitive class="ocf" id="fs" provider="heartbeat" type="Filesystem"><br>          <instance_attributes id="fs-instance_attributes"><br>


            <nvpair id="fs-instance_attributes-fstype" name="fstype" value="ext3"/><br>            <nvpair id="fs-instance_attributes-directory" name="directory" value="/drbd"/><br>


            <nvpair id="fs-instance_attributes-device" name="device" value="/dev/drbd0"/><br>          </instance_attributes><br>          <meta_attributes id="fs-meta_attributes"><br>


            <nvpair id="fs-meta_attributes-is-managed" name="is-managed" value="true"/><br>          </meta_attributes><br>        </primitive><br>        <primitive class="ocf" id="ip" provider="heartbeat" type="IPaddr2"><br>


          <instance_attributes id="ip-instance_attributes"><br>            <nvpair id="ip-instance_attributes-ip" name="ip" value="10.186.68.1"/><br>            <nvpair id="ip-instance_attributes-broadcast" name="broadcast" value="10.186.68.127"/><br>


            <nvpair id="ip-instance_attributes-cidr_netmask" name="cidr_netmask" value="25"/><br>          </instance_attributes><br>          <operations><br>            <op id="ip-monitor-10s" interval="10s" name="monitor"/><br>


          </operations><br>        </primitive><br>        <primitive class="heartbeat" id="drbdlinks" type="drbdlinks"><br>          <operations><br>            <op id="drbdlinks-monitor-60s" interval="60s" name="monitor"/><br>


          </operations><br>        </primitive><br>        <primitive class="ocf" id="postgresql" provider="heartbeat" type="pgsql"><br>          <instance_attributes id="postgresql-instance_attributes"><br>


            <nvpair id="postgresql-instance_attributes-pgctl" name="pgctl" value="/usr/lib/postgresql/8.3/bin/pg_ctl"/><br>            <nvpair id="postgresql-instance_attributes-psql" name="psql" value="/usr/bin/psql"/><br>


            <nvpair id="postgresql-instance_attributes-pgdata" name="pgdata" value="/var/lib/postgresql/8.3/main"/><br>            <nvpair id="postgresql-instance_attributes-pgdba" name="pgdba" value="postgres"/><br>


            <nvpair id="postgresql-instance_attributes-pgdb" name="pgdb" value="postgres"/><br>            <nvpair id="postgresql-instance_attributes-logfile" name="logfile" value="/var/log/postgresql/postgresql-8.3-main.log"/><br>


          </instance_attributes><br>          <operations><br>            <op id="postgresql-monitor-60s" interval="60s" name="monitor" timeout="30s"/><br>          </operations><br>


        </primitive><br>        <primitive class="ocf" id="asterisk" provider="custom" type="Asterisk"><br>          <operations><br>            <op id="asterisk-monitor-60s" interval="60s" name="monitor" start-delay="30s" timeout="30s"/><br>


          </operations><br>        </primitive><br>        <primitive class="lsb" id="postfix" type="postfix"/><br>        <primitive class="ocf" id="apache2" provider="heartbeat" type="apache"><br>


          <instance_attributes id="apache2-instance_attributes"><br>            <nvpair id="apache2-instance_attributes-configfile" name="configfile" value="/etc/apache2/apache2.conf"/><br>


          </instance_attributes><br>          <operations><br>            <op id="apache2-monitor-60s" interval="60s" name="monitor"/><br>          </operations><br>        </primitive><br>


        <primitive class="lsb" id="dhcp" type="dhcp3-server"/><br>        <primitive class="ocf" id="jboss" provider="custom" type="JBoss"><br>


          <instance_attributes id="jboss-instance_attributes"><br>            <nvpair id="jboss-instance_attributes-java_home" name="java_home" value="/opt/java/"/><br>

            <nvpair id="jboss-instance_attributes-jboss_home" name="jboss_home" value="/opt/jboss"/><br>
          </instance_attributes><br>          <operations><br>            <op id="jboss-monitor-60s" interval="60s" name="monitor" start-delay="100s" timeout="30s"/><br>


            <op id="jboss-start-0s" interval="0s" name="start" timeout="99s"/><br>          </operations><br>        </primitive><br>      </group><br>    </resources><br>


    <constraints><br>      <rsc_colocation id="p-group-on-ms-drbd" rsc="p-group" score="INFINITY" with-rsc="ms-drbd" with-rsc-role="Master"/><br>      <rsc_location id="ms-drbd-master-on-primary" rsc="ms-drbd"><br>


        <rule id="ms-drbd-master-on-primary-rule" role="Master" score="100"><br>          <expression attribute="#uname" id="ms-drbd-master-on-primary-expression" operation="eq" value="primary"/><br>


        </rule><br>      </rsc_location><br>      <rsc_order first="ms-drbd" first-action="promote" id="ms-drbd-before-group" score="INFINITY" then="p-group" then-action="start"/><br>


    </constraints><br>    <rsc_defaults/><br>    <op_defaults/><br>  </configuration><br><br>Thanks in advance.<br>Andres.<br><br>
</blockquote></div><br>