[Pacemaker] Two node DRBD cluster will not automatically failover to the secondary

Thu Dec 17 06:29:04 EST 2009

Hi there,

I have setup a two node DRBD culster with pacemaker using the instructions
provided on the drbd.org website:
http://www.drbd.org/users-guide-emb/ch-pacemaker.html  The cluster works
perfectly and I can migrate the resources back and forth between the two
nodes without a problem.  However, if I try simulating a complete server
failure of the master node by powering off the server, pacemaker does not
then automatically bring up the remaining node as the master.  I need some
help to find out what configuration changes I need to make in order for my
cluster to failover automatically.

The cluster is built on 2 Redhat EL 5.3 servers running the following
software versions:
drbd-8.3.6-1
pacemaker-1.0.5-4.1
openais-0.80.5-15.1

Below I have listed the drbd.conf, openais.conf and the output of "crm
configuration show".  If someone could take a look at these for me and
provide any suggestions/modifications I would be most grateful.

Thanks,
Tom

/etc/drbd.conf

global {
  usage-count no;
}
common {
  protocol C;
}
resource r0 {
  disk {
    fencing resource-only;
  }
  handlers {
    fence-peer "/usr/lib/drbd/crm-fence-peer.
sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }
  syncer {
    rate 40M;
  }
  on mq001.back.live.cwwtf.local {
    device    /dev/drbd1;
    disk      /dev/cciss/c0d0p1;
    address   172.23.8.69:7789;
    meta-disk internal;
  }
  on mq002.back.live.cwwtf.local {
    device    /dev/drbd1;
    disk      /dev/cciss/c0d0p1;
    address   172.23.8.70:7789;
    meta-disk internal;
  }
}

root at mq001:~# cat /etc/ais/openais.conf
totem {
  version: 2
  token: 3000
  token_retransmits_before_loss_const: 10
  join: 60
  consensus: 1500
  vsftype: none
  max_messages: 20
  clear_node_high_bit: yes
  secauth: on
  threads: 0
  rrp_mode: passive
  interface {
    ringnumber: 0
    bindnetaddr: 172.59.60.0
    mcastaddr: 239.94.1.1
    mcastport: 5405
  }
  interface {
    ringnumber: 1
    bindnetaddr: 172.23.8.0
    mcastaddr: 239.94.2.1
    mcastport: 5405
  }
}
logging {
  to_stderr: yes
  debug: on
  timestamp: on
  to_file: no
  to_syslog: yes
  syslog_facility: daemon
}
amf {
  mode: disabled
}
service {
  ver:       0
  name:      pacemaker
  use_mgmtd: yes
}
aisexec {
  user:   root
  group:  root
}

root at mq001:~# crm configure show
node mq001.back.live.cwwtf.local
node mq002.back.live.cwwtf.local
primitive activemq-emp lsb:bbc-activemq-emp
primitive activemq-forge-services lsb:bbc-activemq-forge-services
primitive activemq-social lsb:activemq-social
primitive drbd_activemq ocf:linbit:drbd \
    params drbd_resource="r0" \
    op monitor interval="15s"
primitive fs_activemq ocf:heartbeat:Filesystem \
    params device="/dev/drbd1" directory="/drbd" fstype="ext3"
primitive ip_activemq ocf:heartbeat:IPaddr2 \
    params ip="172.23.8.71" nic="eth0"
group activemq fs_activemq ip_activemq activemq-forge-services activemq-emp
activemq-social
ms ms_drbd_activemq drbd_activemq \
    meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true"
colocation activemq_on_drbd inf: activemq ms_drbd_activemq:Master
order activemq_after_drbd inf: ms_drbd_activemq:promote activemq:start
property $id="cib-bootstrap-options" \
    dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    no-quorum-policy="ignore" \
    last-lrm-refresh="1260809203"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20091217/977a8495/attachment.html>