[Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources

Thu Dec 15 08:42:30 EST 2011

Hi All,

Some time ago I exchanged a couple of posts with you here regarding Mysql active-active HA.
The best solution I found so  far was the Mysql multi-master replication, also referred to as circular replication.

Basically I set up two nodes, both were capable of the master role, and the changes were immediately propagated to the other node.

But still I wanted to have a M/S approach, to have a RW master and a RO slave - mainly because I prefer to have a signle master VIP where my apps can connect to.

(In the first approach I configured a two node clone, and the master IP was always bound to one of the nodes)

I applied the following configuration:

node db1 \
        attributes IP="10.100.1.31" \
        attributes standby="off" db2-log-file-db-mysql="mysql-bin.000021" db2-log-pos-db-mysql="40730"
node db2 \
        attributes IP="10.100.1.32" \
        attributes standby="off"
primitive db-ip-master ocf:heartbeat:IPaddr2 \
        params lvs_support="true" ip="10.100.1.30" cidr_netmask="8" broadcast="10.255.255.255" \
        op monitor interval="20s" timeout="20s" \
        meta target-role="Started"
primitive db-mysql ocf:heartbeat:mysql \
        params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" test_passwd="XXXXX"
        test_table="replicatest.connectioncheck" test_user="slave_user" replication_user="slave_user" replication_passwd="XXXXX" additional_parameters="--skip-slave-start" \
        op start interval="0" timeout="120s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \
        op promote interval="0" timeout="120" \
        op demote interval="0" timeout="120"
ms db-ms-mysql db-mysql \
        meta notify="true" master-max="1" clone-max="2" target-role="Started"
colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="0"

The setup works in the basic conditions:

*         After the "first" startup, nodes start up as slaves, and shortly after, one of them is promoted to master.

*         Updates to the master are replicated properly to the slave.

*         Slave accepts updates, which is Wrong, but I can live with this - I will allow connect to the Master VIP only.

*         If I stop the slave for some time, and re-start it, it will catch up with the master shortly and get into sync.

I have, however a serious issue:

*         If I stop the current master, the slave is promoted, accepts RW queries, the Master IP is bound to it - ALL fine.

*         BUT - when I want to bring the other node online, it simply shows: Stopped (not installed)

Online: [ db1 db2 ]

db-ip-master    (ocf::heartbeat:IPaddr2):       Started db1
Master/Slave Set: db-ms-mysql [db-mysql]
     Masters: [ db1 ]
     Stopped: [ db-mysql:1 ]

Node Attributes:
* Node db1:
    + IP                                : 10.100.1.31
    + db2-log-file-db-mysql             : mysql-bin.000021
    + db2-log-pos-db-mysql              : 40730
    + master-db-mysql:0                 : 3601
* Node db2:
    + IP                                : 10.100.1.32

Failed actions:
    db-mysql:0_monitor_30000 (node=db2, call=58, rc=5, status=complete): not installed

I checked the logs, and could not find a reason why the slave at db2 is not started.
Any IDEA Anyone ?

Thanks,
Attila
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111215/ece1ff2a/attachment-0002.html>