[Pacemaker] Issue with clusterlab mysql ocf script

Michael Szilagyi mszilagyi at gmail.com
Mon Aug 29 17:08:00 EDT 2011

Did some more testing and figured I would add that even Slave resources
rejoin the cluster as a Master role briefly before switching back to Slave.
 Of course, since the mysql RA uses event notification this still has the
effect of unsetting all masters whenever a new node joins.  Since a master
role is possibly configured already, the pre-premote notification event
doesn't get fired again and replication remains broken.  It seems likely
that I must be doing something wrong since this would be a pretty normal use
case and completely breaks the mysql replication cluster.

Thoughts anyone?

On Fri, Aug 26, 2011 at 10:19 AM, Michael Szilagyi <mszilagyi at gmail.com>wrote:

> I'm having a problem with master/slave promotion using the most recent
> version of the mysql ocf script hosted off the clusterLabs/resource-agents
> github repo.
> The script works well failing over to a slave if a master looses connection
> with the cluster.  However, when the master rejoins the cluster the script
> is doing some undesirable things.  Basically, if the master looses
> connection (say I pull the network cable) then a new slave is promoted and
> the old master is just orphaned (which is fine, I don't have STONITH setup
> yet or anything).  If i plug that machine's cable back in then the node
> rejoins the cluster and initially there are now two masters (the old,
> orphaned one and the newly promoted one).  Pacemaker properly sees this and
> demotes the old master to a slave.
> After some time debugging the ocf I think what is happening is that the
> script sees the old master join and fires off a post-demote notification
> event for the returning master which causes a unset_master command to be
> executed.  This causes all the slaves to remove their master connection
> info.  However, since the other master server has already been promoted and
> is (to its mind) already replicating to the other slaves in the cluster, a
> new pre-promote is never fired which means that the slaves do not get a new
> CHANGE MASTER TO issued so I wind up with a broken replication setup.
> I'm not sure if I'm missing something in how this is supposed to be working
> or if this is a limitation of the script.  It seems like there must be
> either a bug or something I've got setup wrong, however, since it's not all
> that unlikely that such a scenario could occur.  If anyone has any ideas or
> suggestions on how the script is supposed to work (or what I may be doing
> wrong) I'd appreciate some ideas.
> I'll include the output of my crm configure show in case it'll be useful:
> node $id="a1a3266d-24e2-4d1b-bfd7-de3bac929661" seven \
> attributes"mysql-bin.000005"
>"607" four-log-file-p_mysql="mysql-bin.000040"
> four-log-pos-p_mysql="2150"
> node $id="cc0227a2-a7bc-4a0d-ba1b-f6ecb7e7d845" four \
> attributes"mysql-bin.000005"
>"865" three-log-file-p_mysql="mysql-bin.000022"
> three-log-pos-p_mysql="106"
> node $id="d9d3c6cb-bf60-4468-926f-d9716e56fb0f" three \
> attributes"mysql-bin.000038"
>"607" three-log-pos-p_mysql="4"
> primitive p_mysql ocf:heartbeat:mysql \
> params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf" \
> params pid="/var/lib/mysql/mySQL.pid" socket="/var/run/mysqld/mysqld.sock"
> \
>  params replication_user="sqlSlave" replication_passwd="slave" \
> params additional_parameters="--skip-slave-start" \
>  op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
>  op promote interval="0" timeout="120" \
> op demote interval="0" timeout="120" \
>  op monitor interval="5" role="Master" timeout="30" \
> op monitor interval="10" role="Slave" timeout="30"
> ms ms_mysql p_mysql \
> meta master-max="1" clone-max="3" target-role="Started" is-managed="true"
> notify="true" \
>  meta target-role="Started"
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>  cluster-infrastructure="Heartbeat" \
> stonith-enabled="false" \
>  last-lrm-refresh="1314307995"
> Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110829/7d96b26c/attachment-0003.html>

More information about the Pacemaker mailing list