[Pacemaker] DRBD Master/Slave in a 3 node cluster

Tue Oct 1 03:26:14 EDT 2013

Hi James,

On Mon, 30 Sep 2013 12:31:52 -0700
James Oakley <jfunk at funktronics.ca> wrote:

> I am having some trouble with DRBD Master/Slave resources in a 3-node
> cluster.
> 
> I am using the Pacemaker packages from ha-clustering:Stable on
> openSUSE 12.3. I was going to try the packages from Unstable to see
> if they work better, but it seems the openais package is missing
> there.

I have a quite similar setup, currently running on stock 12.2. I have a test
system just updated to 12.3, with the ha-clustering:Stable,  and it fails
with STONITH enabled almost instantly, due to certain segfaults in the
stonith resources.
With 12.2 it works flawless, and with 12.3 and the Stable repo, but
without STONITH, also.

> So I have 3 nodes, called arthur, jonas, and rusty. The jonas and
> rusty nodes have 4 DRBD master/slave resources, which are used to
> back a series of filesystems, while the arthur node is included
> mainly to avoid split-brain, but I intend to run some resources on it
> as well, and possibly add some more nodes.
:
> Is there anything obvious I am missing?

I don't know, but my configuration is - as said - almost similar, but a
_lot_ shorter, due to usage of groups and thus far less contraints and
location definitions. My nodes are virtual machines in VMware, thus the
vcenter stonith resources. The nodes, hermes1 and hermes 2 have the
drbd resources, hermes1 being the preferred node, and hermes3 is
there for quorum (and logs):

=====
node hermes1
node hermes2
node hermes3
primitive apache2 lsb:apache2 \
        meta failure-timeout="90" \
        operations $id="apache2-operations" \
        op monitor interval="15" timeout="15"
primitive drbdr0 ocf:linbit:drbd \
        params drbd_resource="r0" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        op monitor interval="30"
primitive drbdr1 ocf:linbit:drbd \
        params drbd_resource="r1" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        op monitor interval="30" \
        meta target-role="Started"
primitive firewall_rules lsb:firewall_rules \
        meta failure-timeout="90" \
        operations $id="firewall_rules-operations" \
        op monitor interval="60" timeout="60"
primitive fs_0 ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/r0" directory="/conf" fstype="ext4" options="defaults" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="60" timeout="40" depth="0" \
        meta target-role="Started"
primitive fs_1 ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/r1" directory="/var/spool/postfix" fstype="ext4" options="defaults" \
        op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \
        op monitor interval="60" timeout="40" depth="0" \
        meta target-role="Started"
primitive getrecipientaccess lsb:getrecipientaccess \
        meta failure-timeout="90" \
        operations $id="getrecipientaccess-operations" \
        op monitor interval="15" timeout="15"
primitive mailgraph lsb:mailgraph \
        meta failure-timeout="90" \
        operations $id="mailgraph-operations" \
        op monitor interval="15" timeout="15"
primitive policyd-weight lsb:policyd-weight \
        meta failure-timeout="90" \
        operations $id="policyd-weight-operations" \
        op monitor interval="15" timeout="15"
primitive postfix lsb:postfix \
        meta failure-timeout="90" \
        operations $id="postfix-operations" \
        op monitor interval="15" timeout="15"
primitive postgrey lsb:postgrey \
        meta failure-timeout="90" \
        operations $id="postgrey-operations" \
        op monitor interval="15" timeout="15"
primitive queuegraph lsb:queuegraph \
        meta failure-timeout="90" \
        operations $id="queuegraph-operations" \
        op monitor interval="15" timeout="15"
primitive saslauthd lsb:saslauthd \
        meta failure-timeout="90" \
        operations $id="saslauthd-operations" \
        op monitor interval="15" timeout="15"
primitive spammailgraph lsb:spammailgraph \
        meta failure-timeout="90" \
        operations $id="spammailgraph-operations" \
        op monitor interval="15" timeout="15"
primitive updateispwhitelist lsb:updateispwhitelist \
        meta failure-timeout="90" \
        operations $id="updateispwhitelist-operations" \
        op monitor interval="15" timeout="15"
primitive vfencing stonith:external/vcenter \
        params VI_SERVER="svirtctr.it.ctr.internal" \
        VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" \
        HOSTLIST="hermes1=SHERMES1;hermes2=SHERMES2;shermes3=SHERMES3" \
        RESETPOWERON="0" \ op monitor start-delay="15s" interval="3600s"
primitive vip_1 ocf:heartbeat:IPaddr2 \ params ip="10.183.75.23" nic="eth0" iflabel="0" cidr_netmask="26" \ 
        op monitor interval="10" timeout="20"
group apps vip_1 firewall_rules postgrey policyd-weight saslauthd postfix apache2 mailgraph queuegraph spammailgraph getrecipientaccess updateispwhitelist \
        meta target-role="Started"
group fs fs_0 fs_1 group g-drbd drbdr0 drbdr1 ms ms_drbd g-drbd \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
clone Fencing vfencing
location l-Fencing_hermes1 Fencing 0: hermes1
location l-Fencing_hermes2 Fencing 0: hermes2
location l-Fencing_hermes3 Fencing 0: hermes3
location l-apache2-hermes3 apache2 -inf: hermes3
location l-apps-hermes1 apps 50: hermes1
location l-apps-hermes2 apps 0: hermes2
location l-fs-hermes1 fs 50: hermes1
location l-fs-hermes2 fs 0: hermes2
location l-mailgraph-hermes3 mailgraph -inf: hermes3
location l-ms_drbd_hermes1 ms_drbd 50: hermes1
location l-ms_drbd_hermes2 ms_drbd 0: hermes2
location l-postfix-hermes3 postfix -inf: hermes3
location l-queuegraph-hermes3 queuegraph -inf: hermes3
location l-spammailgraph-hermes3 spammailgraph -inf: hermes3
colocation cl-apps_on_fs inf: fs:Started apps:Started
colocation cl-fs_on_drbd_r0 inf: ms_drbd:Master fs:Started
order o-apps_after_fs inf: fs:start apps:start
order o-fs_after_drbd inf: ms_drbd:promote fs:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        symmetric-cluster="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1380199456" \
        stonith-action="poweroff"
=====

Note, that there are only two colocation and two order statements, and I believe, that I could get rid of some of the location statements, too.

As said, this setup currently runs on openSUSE 12.2
I know, 13.1 is near, but I fear the status of the ha-clustering in 13.1 will not be that great, so maybe you give it a try with a 12.2 installation first.

Greetings,

Stefan
-- 
Stefan Botter zu Hause
Bremen