[Pacemaker] Master/Slave DRBD switch caused some problems

Tue Dec 21 12:35:04 EST 2010

Hi,

I have two nodes rspa and rspa2 (both Centos 5.3 32bits) with the following
packages:

drbd83-8.3.8-1.el5.centos
heartbeat-3.0.3-2.3.el5
pacemaker-1.0.10-1.4.el5

rspa is stopped, and rspa2 has all the resources (IP, FileSystem, Mysql,
Apache and DRBD Master)
When I start heatbeat on rspa, for some reason (I don't have any
resource_location specified) it tries to move all resources to that node,
but when trying to demote drbd on rspa2 (node2) and promote drbd on rspa
(node1) something must go wrong as my DRBD partition (being used by MySQL)
gets unresponsive.

Next it stops Apache (works), and tries to stop MySQL which fails because it
uses the unresponsive partition.
As a result of this my high availability cluster ends up in the limbo; it
doesn't migrate to node1, neither to node2.

Any help is welcome here...

 [root at rspa2 ~]# crm status
============
Last updated: Tue Dec 21 18:12:47 2010
Stack: Heartbeat
Current DC: rspa2.sadiel.es (2680c85b-7e6c-4610-88b2-510feb60c4b4) -
partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ rspa2.domain rspa.domain ]

 Resource Group: mysql
     fs_mysql    (ocf::heartbeat:Filesystem):    Started rspa2.domain
     ip_mysql    (ocf::heartbeat:IPaddr2):    Started rspa2.domain
     mysqld    (lsb:mysqld):    Started rspa2.domain (unmanaged) FAILED
     apache    (lsb:httpd):    Stopped
 Master/Slave Set: ms_drbd_mysql
     Masters: [ rspa2.domain ]
     Slaves: [ rspa.domain ]

Failed actions:
    mysqld_stop_0 (node=rspa2.domain, call=18, rc=-2, status=Timed Out):
unknown exec error

Please see my Pacemaker config:

node $id="2680c85b-7e6c-4610-88b2-510feb60c4b4" rspa2.domain \
    attributes standby="off"
node $id="f9be4a80-ec2a-42e3-8d86-62dd050b437b" rspa.domain \
    attributes standby="off"
primitive apache lsb:httpd
primitive drbd_mysql ocf:linbit:drbd \
    params drbd_resource="r0" \
    op monitor interval="15s" \
    op monitor interval="16s" role="Master"
primitive fs_mysql ocf:heartbeat:Filesystem \
    params device="/dev/drbd0" directory="/opt/drbd/" fstype="xfs"
primitive ip_mysql ocf:heartbeat:IPaddr2 \
    params ip="172.18.2.150" nic="eth0:1"
primitive mysqld lsb:mysqld
group mysql fs_mysql ip_mysql mysqld apache
ms ms_drbd_mysql drbd_mysql \
    meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true" target-role="Started" is-managed="true"
colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
property $id="cib-bootstrap-options" \
    no-quorum-policy="ignore" \
    stonith-enabled="false" \
    expected-quorum-votes="2" \
    dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
    cluster-infrastructure="Heartbeat"

This is what's printed in /var/log/messages: http://pastebin.com/W68jPQKJ
And /var/log/ha.log : http://pastebin.com/SBQz1gU3

My DRBD partition (dev/drbd0) is mounted on /opt/drbd and when I do "ls" it
just hangs.
In case it's useful, please see here lsof output:

[root at rspa2 ~]# lsof | grep drbd
drbd0_wor  3422      root  cwd       DIR        8,2     4096          2 /
drbd0_wor  3422      root  rtd       DIR        8,2     4096          2 /
drbd0_wor  3422      root  txt   unknown
/proc/3422/exe
drbd0_rec  3425      root  cwd       DIR        8,2     4096          2 /
drbd0_rec  3425      root  rtd       DIR        8,2     4096          2 /
drbd0_rec  3425      root  txt   unknown
/proc/3425/exe
drbd0_ase  4876      root  cwd       DIR        8,2     4096          2 /
drbd0_ase  4876      root  rtd       DIR        8,2     4096          2 /
drbd0_ase  4876      root  txt   unknown
/proc/4876/exe
mysqld    12322     mysql  cwd       DIR      147,0       96        131
/opt/drbd/mysql
mysqld    12322     mysql    3uW     REG      147,0 10485760        135
/opt/drbd/mysql/ibdata1
mysqld    12322     mysql    8uW     REG      147,0  5242880        133
/opt/drbd/mysql/ib_logfile0
mysqld    12322     mysql    9uW     REG      147,0  5242880        134
/opt/drbd/mysql/ib_logfile1
ls        12729      root    3r      DIR      147,0       51        128
/opt/drbd
bash      12889      root    3r      DIR      147,0       51        128
/opt/drbd
ls        13117      root    3r      DIR      147,0       51        128
/opt/drbd

Heartbeat configuration file:
[root at rspa2 ~]# cat /etc/ha.d/ha.cf
use_logd no
logfile /var/log/ha.log
autojoin none
warntime 5
deadtime 15
initdead 30
ucast eth0 172.18.2.137
node rspa.domain rspa2.domain
crm yes

And last but not least, my DRBD configuration on both nodes:

global {
  usage-count yes;
}
common {
  protocol C;
  syncer {
    rate 10M;
  }
}
resource r0 {
  net {
        data-integrity-alg md5;
  }
  on rspa.domain {
    device    /dev/drbd0;
    disk      /dev/sda4;
    address   IP:7789;
    meta-disk internal;
  }
  on rspa2.domain {
    device    /dev/drbd0;
    disk      /dev/sda4;
    address   IP:7789;
    meta-disk internal;
  }
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101221/b777b305/attachment.html>