[Pacemaker] Lots of Issues with Live Pacemaker Cluster
    Andrew Beekhof 
    andrew at beekhof.net
       
    Mon Mar 14 11:19:20 UTC 2011
    
    
  
On Mon, Mar 14, 2011 at 11:57 AM,  <Darren.Mansell at opengi.co.uk> wrote:
> Hello everyone.
>
>
>
> I built and put into production without adequate testing a 2 node cluster
> running Ubuntu 10.04 LTS with Pacemaker and associated packages from the
> Ubuntu-HA-maintainers repo
> (https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa).
>
>
>
> I’ve always had many problems with my build, mainly because it was
> over-complicated and I didn’t have adequate time to test it and tweak it
> before putting it live. If I list my problems below, could anyone have a
> look and see if there is anything obvious? Thanks.
>
>
>
> 1.       DRBD doesn’t promote/demote correctly. Whenever I have a failover,
> the DRBD resource will just sit there on the wrong node, holding up all
> other operations. It’s like the demote never happens. Nothing is logged when
> this happens, it just sits forever with half of the resources stopped and
> DRBD master on the wrong node.
For this at least I'd encourage a bug report with a hb_report archive.
Without the logs, the configuration alone wont tell us much.
> I’m using the Linbit RA with the following
> config:
>
>
>
> primitive DRBD_MySQL ocf:linbit:drbd \
>
>        params drbd_resource="DRBD_MySQL" \
>
>        meta failure-timeout="60" migration-threshold="10" \
>
>        op monitor interval="30s"
>
> primitive fs_DRBD_MySQL ocf:heartbeat:Filesystem \
>
>        params device="/dev/drbd/by-res/DRBD_MySQL"
> directory="/var/lib/mysql" fstype="ext4" \
>
>        meta failure-timeout="60" migration-threshold="10"
> target-role="Started" \
>
>        op monitor interval="30s"
>
> primitive MySQL lsb:mysql \
>
>        meta failure-timeout="60" migration-threshold="10"
> target-role="Started" \
>
>        op monitor interval="30s"
>
> ms ms_DRBD_MySQL DRBD_MySQL \
>
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Master"
>
> colocation MySQL_on_ms_DRBD_MySQL inf: MySQL ms_DRBD_MySQL:Master
>
> colocation fs_on_DRBD_MySQL inf: fs_DRBD_MySQL ms_DRBD_MySQL:Master
>
> order MySQL_after_DRBD inf: fs_DRBD_MySQL:start MySQL:start
>
> order fs_after_DRBD_MySQL inf: ms_DRBD_MySQL:promote fs_DRBD_MySQL:start
>
>
>
> /etc/drbd.conf:
>
> global {
>
>               usage-count   yes;
>
> }
>
> common {
>
>   protocol C;
>
> }
>
> resource DRBD_MySQL {
>
>        syncer {
>
>               rate   100M;
>
>        }
>
>         net {
>
>     after-sb-0pri discard-zero-changes;
>
>     after-sb-1pri discard-secondary;
>
>   }
>
>        on OGW-HOSTING-01 {
>
>               device /dev/drbd2;
>
>               disk   /dev/vg1/MySQL;
>
>               address       10.0.0.1:7790;
>
>               flexible-meta-disk   internal;
>
>        }
>
>        on OGW-HOSTING-02 {
>
>               device /dev/drbd2;
>
>               disk   /dev/vg1/MySQL;
>
>               address       10.0.0.2:7790;
>
>               flexible-meta-disk   internal;
>
>        }
>
> }
>
>
>
> 2.       Crm shell won’t load from a text file. When I use crm configure <
> crm.txt, it will run through the file, complaining about the default timeout
> being less than 240, but doesn’t load anything. So I go into the crm shell
> and set default-action-timeout to 240, commit and exit and do the same. This
> time it just exits silently, without loading the config. If I go into the
> crm shell and use load replace crm.txt it will work.
>
>
>
> 3.       Crm shell tab completes don’t work unless you put an incorrect
> entry in first. I’m sure this is a python readline problem, as it also
> happens in SLE 11 HAE SP1 (but not in pre-SP1). I assume everyone associated
> (Dejan?) is aware of the problem, but highlighting it just in case.
>
>
>
> I’ve attached my crm config, cib XML, /etc/drbd.conf for reference. Please
> forgive my SSH STONITH, I’ve not had chance to get the IBM RSA configured on
> it yet.
>
>
>
> Thanks all!
>
> Best regards,
>
> Darren Mansell
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
    
    
More information about the Pacemaker
mailing list