[Pacemaker] Master/Slave DRBD switch caused some problems

Lars Ellenberg lars.ellenberg at linbit.com
Wed Jan 19 04:25:36 EST 2011


On Tue, Dec 21, 2010 at 06:35:04PM +0100, Marc Wilmots wrote:
> Hi,
> 
> I have two nodes rspa and rspa2 (both Centos 5.3 32bits) with the following
> packages:
> 
> drbd83-8.3.8-1.el5.centos

Not sure what exactly that is, but if it is equivalent to "8.3.8", not
"8.3.8.1", as tagged in git, then that may be the reason for some
strangeness.

For reporting drbd related issues, cat /proc/drbd
and kernel logs are usually useful as well.

> heartbeat-3.0.3-2.3.el5
> pacemaker-1.0.10-1.4.el5
> 
> rspa is stopped, and rspa2 has all the resources (IP, FileSystem, Mysql,
> Apache and DRBD Master)
> When I start heatbeat on rspa, for some reason (I don't have any
> resource_location specified) it tries to move all resources to that node,
> but when trying to demote drbd on rspa2 (node2) and promote drbd on rspa
> (node1) something must go wrong as my DRBD partition (being used by MySQL)
> gets unresponsive.
> 
> Next it stops Apache (works), and tries to stop MySQL which fails because it
> uses the unresponsive partition.
> As a result of this my high availability cluster ends up in the limbo; it
> doesn't migrate to node1, neither to node2.
> 
> Any help is welcome here...
> 
>  [root at rspa2 ~]# crm status
> ============
> Last updated: Tue Dec 21 18:12:47 2010
> Stack: Heartbeat
> Current DC: rspa2.sadiel.es (2680c85b-7e6c-4610-88b2-510feb60c4b4) -
> partition with quorum
> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ rspa2.domain rspa.domain ]
> 
>  Resource Group: mysql
>      fs_mysql    (ocf::heartbeat:Filesystem):    Started rspa2.domain
>      ip_mysql    (ocf::heartbeat:IPaddr2):    Started rspa2.domain
>      mysqld    (lsb:mysqld):    Started rspa2.domain (unmanaged) FAILED
>      apache    (lsb:httpd):    Stopped
>  Master/Slave Set: ms_drbd_mysql
>      Masters: [ rspa2.domain ]
>      Slaves: [ rspa.domain ]
> 
> Failed actions:
>     mysqld_stop_0 (node=rspa2.domain, call=18, rc=-2, status=Timed Out):
> unknown exec error
> 
> Please see my Pacemaker config:
> 
> node $id="2680c85b-7e6c-4610-88b2-510feb60c4b4" rspa2.domain \
>     attributes standby="off"
> node $id="f9be4a80-ec2a-42e3-8d86-62dd050b437b" rspa.domain \
>     attributes standby="off"
> primitive apache lsb:httpd
> primitive drbd_mysql ocf:linbit:drbd \
>     params drbd_resource="r0" \
>     op monitor interval="15s" \
>     op monitor interval="16s" role="Master"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>     params device="/dev/drbd0" directory="/opt/drbd/" fstype="xfs"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
>     params ip="172.18.2.150" nic="eth0:1"
> primitive mysqld lsb:mysqld
> group mysql fs_mysql ip_mysql mysqld apache
> ms ms_drbd_mysql drbd_mysql \
>     meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true" target-role="Started" is-managed="true"
> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> property $id="cib-bootstrap-options" \
>     no-quorum-policy="ignore" \
>     stonith-enabled="false" \
>     expected-quorum-votes="2" \
>     dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>     cluster-infrastructure="Heartbeat"
> 
> This is what's printed in /var/log/messages: http://pastebin.com/W68jPQKJ
> And /var/log/ha.log : http://pastebin.com/SBQz1gU3

Sorry, I don't have time to go through those right now.

> My DRBD partition (dev/drbd0) is mounted on /opt/drbd and when I do "ls" it
> just hangs.
> In case it's useful, please see here lsof output:

No, that does not help at all.

> Heartbeat configuration file:
> [root at rspa2 ~]# cat /etc/ha.d/ha.cf
> use_logd no
> logfile /var/log/ha.log
> autojoin none
> warntime 5
> deadtime 15
> initdead 30
> ucast eth0 172.18.2.137
> node rspa.domain rspa2.domain
> crm yes
> 
> And last but not least, my DRBD configuration on both nodes:
> 
> global {
>   usage-count yes;
> }
> common {
>   protocol C;
>   syncer {
>     rate 10M;
>   }
> }
> resource r0 {
>   net {
>         data-integrity-alg md5;
>   }
>   on rspa.domain {
>     device    /dev/drbd0;
>     disk      /dev/sda4;
>     address   IP:7789;
>     meta-disk internal;
>   }
>   on rspa2.domain {
>     device    /dev/drbd0;
>     disk      /dev/sda4;
>     address   IP:7789;
>     meta-disk internal;
>   }
> }


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.




More information about the Pacemaker mailing list