[Pacemaker] DRBD Master/Slave in a 3 node cluster

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Oct 1 07:57:12 EDT 2013


Hi Stefan,

On Tue, Oct 01, 2013 at 09:26:14AM +0200, Stefan Botter wrote:
> Hi James,
> 
> On Mon, 30 Sep 2013 12:31:52 -0700
> James Oakley <jfunk at funktronics.ca> wrote:
> 
> > I am having some trouble with DRBD Master/Slave resources in a 3-node
> > cluster.
> > 
> > I am using the Pacemaker packages from ha-clustering:Stable on
> > openSUSE 12.3. I was going to try the packages from Unstable to see
> > if they work better, but it seems the openais package is missing
> > there.
> 
> I have a quite similar setup, currently running on stock 12.2. I have a test
> system just updated to 12.3, with the ha-clustering:Stable,  and it fails
> with STONITH enabled almost instantly, due to certain segfaults in the
> stonith resources.

What exactly segfaults? Is it related to a particular stonith
agent? Did you open a bugzilla for that?

> With 12.2 it works flawless, and with 12.3 and the Stable repo, but
> without STONITH, also.

That's a critical issue which needs to be fixed.

Thanks,

Dejan

> > So I have 3 nodes, called arthur, jonas, and rusty. The jonas and
> > rusty nodes have 4 DRBD master/slave resources, which are used to
> > back a series of filesystems, while the arthur node is included
> > mainly to avoid split-brain, but I intend to run some resources on it
> > as well, and possibly add some more nodes.
> :
> > Is there anything obvious I am missing?
> 
> I don't know, but my configuration is - as said - almost similar, but a
> _lot_ shorter, due to usage of groups and thus far less contraints and
> location definitions. My nodes are virtual machines in VMware, thus the
> vcenter stonith resources. The nodes, hermes1 and hermes 2 have the
> drbd resources, hermes1 being the preferred node, and hermes3 is
> there for quorum (and logs):
> 
> =====
> node hermes1
> node hermes2
> node hermes3
> primitive apache2 lsb:apache2 \
>         meta failure-timeout="90" \
>         operations $id="apache2-operations" \
>         op monitor interval="15" timeout="15"
> primitive drbdr0 ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="100" \
>         op monitor interval="30"
> primitive drbdr1 ocf:linbit:drbd \
>         params drbd_resource="r1" \
>         op start interval="0" timeout="240" \
>         op stop interval="0" timeout="100" \
>         op monitor interval="30" \
>         meta target-role="Started"
> primitive firewall_rules lsb:firewall_rules \
>         meta failure-timeout="90" \
>         operations $id="firewall_rules-operations" \
>         op monitor interval="60" timeout="60"
> primitive fs_0 ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/r0" directory="/conf" fstype="ext4" options="defaults" \
>         op start interval="0" timeout="60" \
>         op stop interval="0" timeout="60" \
>         op monitor interval="60" timeout="40" depth="0" \
>         meta target-role="Started"
> primitive fs_1 ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/r1" directory="/var/spool/postfix" fstype="ext4" options="defaults" \
>         op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \
>         op monitor interval="60" timeout="40" depth="0" \
>         meta target-role="Started"
> primitive getrecipientaccess lsb:getrecipientaccess \
>         meta failure-timeout="90" \
>         operations $id="getrecipientaccess-operations" \
>         op monitor interval="15" timeout="15"
> primitive mailgraph lsb:mailgraph \
>         meta failure-timeout="90" \
>         operations $id="mailgraph-operations" \
>         op monitor interval="15" timeout="15"
> primitive policyd-weight lsb:policyd-weight \
>         meta failure-timeout="90" \
>         operations $id="policyd-weight-operations" \
>         op monitor interval="15" timeout="15"
> primitive postfix lsb:postfix \
>         meta failure-timeout="90" \
>         operations $id="postfix-operations" \
>         op monitor interval="15" timeout="15"
> primitive postgrey lsb:postgrey \
>         meta failure-timeout="90" \
>         operations $id="postgrey-operations" \
>         op monitor interval="15" timeout="15"
> primitive queuegraph lsb:queuegraph \
>         meta failure-timeout="90" \
>         operations $id="queuegraph-operations" \
>         op monitor interval="15" timeout="15"
> primitive saslauthd lsb:saslauthd \
>         meta failure-timeout="90" \
>         operations $id="saslauthd-operations" \
>         op monitor interval="15" timeout="15"
> primitive spammailgraph lsb:spammailgraph \
>         meta failure-timeout="90" \
>         operations $id="spammailgraph-operations" \
>         op monitor interval="15" timeout="15"
> primitive updateispwhitelist lsb:updateispwhitelist \
>         meta failure-timeout="90" \
>         operations $id="updateispwhitelist-operations" \
>         op monitor interval="15" timeout="15"
> primitive vfencing stonith:external/vcenter \
>         params VI_SERVER="svirtctr.it.ctr.internal" \
>         VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" \
>         HOSTLIST="hermes1=SHERMES1;hermes2=SHERMES2;shermes3=SHERMES3" \
>         RESETPOWERON="0" \ op monitor start-delay="15s" interval="3600s"
> primitive vip_1 ocf:heartbeat:IPaddr2 \ params ip="10.183.75.23" nic="eth0" iflabel="0" cidr_netmask="26" \ 
>         op monitor interval="10" timeout="20"
> group apps vip_1 firewall_rules postgrey policyd-weight saslauthd postfix apache2 mailgraph queuegraph spammailgraph getrecipientaccess updateispwhitelist \
>         meta target-role="Started"
> group fs fs_0 fs_1 group g-drbd drbdr0 drbdr1 ms ms_drbd g-drbd \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> clone Fencing vfencing
> location l-Fencing_hermes1 Fencing 0: hermes1
> location l-Fencing_hermes2 Fencing 0: hermes2
> location l-Fencing_hermes3 Fencing 0: hermes3
> location l-apache2-hermes3 apache2 -inf: hermes3
> location l-apps-hermes1 apps 50: hermes1
> location l-apps-hermes2 apps 0: hermes2
> location l-fs-hermes1 fs 50: hermes1
> location l-fs-hermes2 fs 0: hermes2
> location l-mailgraph-hermes3 mailgraph -inf: hermes3
> location l-ms_drbd_hermes1 ms_drbd 50: hermes1
> location l-ms_drbd_hermes2 ms_drbd 0: hermes2
> location l-postfix-hermes3 postfix -inf: hermes3
> location l-queuegraph-hermes3 queuegraph -inf: hermes3
> location l-spammailgraph-hermes3 spammailgraph -inf: hermes3
> colocation cl-apps_on_fs inf: fs:Started apps:Started
> colocation cl-fs_on_drbd_r0 inf: ms_drbd:Master fs:Started
> order o-apps_after_fs inf: fs:start apps:start
> order o-fs_after_drbd inf: ms_drbd:promote fs:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.1.6-b988976485d15cb702c9307df55512d323831a5e" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="3" \
>         symmetric-cluster="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1380199456" \
>         stonith-action="poweroff"
> =====
> 
> Note, that there are only two colocation and two order statements, and I believe, that I could get rid of some of the location statements, too.
> 
> As said, this setup currently runs on openSUSE 12.2
> I know, 13.1 is near, but I fear the status of the ha-clustering in 13.1 will not be that great, so maybe you give it a try with a 12.2 installation first.
> 
> Greetings,
> 
> Stefan
> -- 
> Stefan Botter zu Hause
> Bremen
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list