[Pacemaker] Two node DRBD cluster will not automatically failover to the secondary

Tom Pride tom.pride at gmail.com
Wed Feb 3 15:03:12 UTC 2010

Hi Shravan,

Thank you very much for your reply.  I know it was quite a while ago that I
posted my question to the mailing list, but I've been working on other
things and have only just had the chance to come back to this.

You say that I need to setup stonith resources along with setting
"stonith-enabled" = true.  Well I know how to change the stonith-enabled
setting, but I have no clue as to how I go about setting up the appropriate
stonith resources to prevent DRBD from getting into a split brain
situation.  The documentation provided on the DRBD website about setting up
a 2 node cluster with Pacemaker doesn't tell you to enable stonith or
configure stonith resources. It does talk about the resource fencing options
within the /etc/drbd.conf of which I have configured:

resource r0 {
  disk {
    fencing resource-only;
  handlers {
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";

I've searched the internet high and low for example pacemaker configs that
show you how to configure stonith resources for DRBD, but I can't find
anything useful.

This howto (
 that I found spells out how to configure a cluster and even states:
"STONITH is disabled in this configuration though it is highly-recommended
in any production environment to eliminate the risk of divergent data." but
infuriatingly it doesn't tell you how.

Could you please give me some pointers or some helpful examples or perhaps
point me to someone or something that can give me a hand in this area?

Many Thanks

On Thu, Dec 17, 2009 at 2:14 PM, Shravan Mishra <shravan.mishra at gmail.com>wrote:

> Hi,
> For stateful resources like drbd you will have to setup stonith resources
> for them to function properly or at all.
> "stonith-enabled" is true by default.
> Sincerely
> Shravan
> On Thu, Dec 17, 2009 at 6:29 AM, Tom Pride <tom.pride at gmail.com> wrote:
>> Hi there,
>> I have setup a two node DRBD culster with pacemaker using the instructions
>> provided on the drbd.org website:
>> http://www.drbd.org/users-guide-emb/ch-pacemaker.html  The cluster works
>> perfectly and I can migrate the resources back and forth between the two
>> nodes without a problem.  However, if I try simulating a complete server
>> failure of the master node by powering off the server, pacemaker does not
>> then automatically bring up the remaining node as the master.  I need some
>> help to find out what configuration changes I need to make in order for my
>> cluster to failover automatically.
>> The cluster is built on 2 Redhat EL 5.3 servers running the following
>> software versions:
>> drbd-8.3.6-1
>> pacemaker-1.0.5-4.1
>> openais-0.80.5-15.1
>> Below I have listed the drbd.conf, openais.conf and the output of "crm
>> configuration show".  If someone could take a look at these for me and
>> provide any suggestions/modifications I would be most grateful.
>> Thanks,
>> Tom
>> /etc/drbd.conf
>> global {
>>   usage-count no;
>> }
>> common {
>>   protocol C;
>> }
>> resource r0 {
>>   disk {
>>     fencing resource-only;
>>   }
>>   handlers {
>>     fence-peer "/usr/lib/drbd/crm-fence-peer.
>> sh";
>>     after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>>   }
>>   syncer {
>>     rate 40M;
>>   }
>>   on mq001.back.live.cwwtf.local {
>>     device    /dev/drbd1;
>>     disk      /dev/cciss/c0d0p1;
>>     address;
>>     meta-disk internal;
>>   }
>>   on mq002.back.live.cwwtf.local {
>>     device    /dev/drbd1;
>>     disk      /dev/cciss/c0d0p1;
>>     address;
>>     meta-disk internal;
>>   }
>> }
>> root at mq001:~# cat /etc/ais/openais.conf
>> totem {
>>   version: 2
>>   token: 3000
>>   token_retransmits_before_loss_const: 10
>>   join: 60
>>   consensus: 1500
>>   vsftype: none
>>   max_messages: 20
>>   clear_node_high_bit: yes
>>   secauth: on
>>   threads: 0
>>   rrp_mode: passive
>>   interface {
>>     ringnumber: 0
>>     bindnetaddr:
>>     mcastaddr:
>>     mcastport: 5405
>>   }
>>   interface {
>>     ringnumber: 1
>>     bindnetaddr:
>>     mcastaddr:
>>     mcastport: 5405
>>   }
>> }
>> logging {
>>   to_stderr: yes
>>   debug: on
>>   timestamp: on
>>   to_file: no
>>   to_syslog: yes
>>   syslog_facility: daemon
>> }
>> amf {
>>   mode: disabled
>> }
>> service {
>>   ver:       0
>>   name:      pacemaker
>>   use_mgmtd: yes
>> }
>> aisexec {
>>   user:   root
>>   group:  root
>> }
>> root at mq001:~# crm configure show
>> node mq001.back.live.cwwtf.local
>> node mq002.back.live.cwwtf.local
>> primitive activemq-emp lsb:bbc-activemq-emp
>> primitive activemq-forge-services lsb:bbc-activemq-forge-services
>> primitive activemq-social lsb:activemq-social
>> primitive drbd_activemq ocf:linbit:drbd \
>>     params drbd_resource="r0" \
>>     op monitor interval="15s"
>> primitive fs_activemq ocf:heartbeat:Filesystem \
>>     params device="/dev/drbd1" directory="/drbd" fstype="ext3"
>> primitive ip_activemq ocf:heartbeat:IPaddr2 \
>>     params ip="" nic="eth0"
>> group activemq fs_activemq ip_activemq activemq-forge-services
>> activemq-emp activemq-social
>> ms ms_drbd_activemq drbd_activemq \
>>     meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true"
>> colocation activemq_on_drbd inf: activemq ms_drbd_activemq:Master
>> order activemq_after_drbd inf: ms_drbd_activemq:promote activemq:start
>> property $id="cib-bootstrap-options" \
>>     dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
>>     cluster-infrastructure="openais" \
>>     expected-quorum-votes="2" \
>>     no-quorum-policy="ignore" \
>>     last-lrm-refresh="1260809203"
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100203/def60a8e/attachment-0001.html>

More information about the Pacemaker mailing list