[Pacemaker] Lots of Issues with Live Pacemaker Cluster

Mon Mar 14 10:57:27 UTC 2011

Hello everyone.


I built and put into production without adequate testing a 2 node
cluster running Ubuntu 10.04 LTS with Pacemaker and associated packages
from the Ubuntu-HA-maintainers repo


I've always had many problems with my build, mainly because it was
over-complicated and I didn't have adequate time to test it and tweak it
before putting it live. If I list my problems below, could anyone have a
look and see if there is anything obvious? Thanks.


1.       DRBD doesn't promote/demote correctly. Whenever I have a
failover, the DRBD resource will just sit there on the wrong node,
holding up all other operations. It's like the demote never happens.
Nothing is logged when this happens, it just sits forever with half of
the resources stopped and DRBD master on the wrong node. I'm using the
Linbit RA with the following config:


primitive DRBD_MySQL ocf:linbit:drbd \

       params drbd_resource="DRBD_MySQL" \

       meta failure-timeout="60" migration-threshold="10" \

       op monitor interval="30s"

primitive fs_DRBD_MySQL ocf:heartbeat:Filesystem \

       params device="/dev/drbd/by-res/DRBD_MySQL"
directory="/var/lib/mysql" fstype="ext4" \

       meta failure-timeout="60" migration-threshold="10"
target-role="Started" \

       op monitor interval="30s"

primitive MySQL lsb:mysql \

       meta failure-timeout="60" migration-threshold="10"
target-role="Started" \

       op monitor interval="30s"


       meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Master"

colocation MySQL_on_ms_DRBD_MySQL inf: MySQL ms_DRBD_MySQL:Master

colocation fs_on_DRBD_MySQL inf: fs_DRBD_MySQL ms_DRBD_MySQL:Master

order MySQL_after_DRBD inf: fs_DRBD_MySQL:start MySQL:start

order fs_after_DRBD_MySQL inf: ms_DRBD_MySQL:promote fs_DRBD_MySQL:start



global {

              usage-count   yes;


common {

  protocol C;


resource DRBD_MySQL {

       syncer {

              rate   100M;


        net {

    after-sb-0pri discard-zero-changes;

    after-sb-1pri discard-secondary;


       on OGW-HOSTING-01 {

              device /dev/drbd2;

              disk   /dev/vg1/MySQL;


              flexible-meta-disk   internal;


       on OGW-HOSTING-02 {

              device /dev/drbd2;

              disk   /dev/vg1/MySQL;


              flexible-meta-disk   internal;




2.       Crm shell won't load from a text file. When I use crm configure
< crm.txt, it will run through the file, complaining about the default
timeout being less than 240, but doesn't load anything. So I go into the
crm shell and set default-action-timeout to 240, commit and exit and do
the same. This time it just exits silently, without loading the config.
If I go into the crm shell and use load replace crm.txt it will work.


3.       Crm shell tab completes don't work unless you put an incorrect
entry in first. I'm sure this is a python readline problem, as it also
happens in SLE 11 HAE SP1 (but not in pre-SP1). I assume everyone
associated (Dejan?) is aware of the problem, but highlighting it just in


I've attached my crm config, cib XML, /etc/drbd.conf for reference.
Please forgive my SSH STONITH, I've not had chance to get the IBM RSA
configured on it yet.


Thanks all!

Best regards,

Darren Mansell

