Did some more testing and figured I would add that even Slave resources rejoin the cluster as a Master role briefly before switching back to Slave.  Of course, since the mysql RA uses event notification this still has the effect of unsetting all masters whenever a new node joins.  Since a master role is possibly configured already, the pre-premote notification event doesn't get fired again and replication remains broken.  It seems likely that I must be doing something wrong since this would be a pretty normal use case and completely breaks the mysql replication cluster.<div>


<br></div><div>Thoughts anyone?</div><div><br><br><div class="gmail_quote">On Fri, Aug 26, 2011 at 10:19 AM, Michael Szilagyi <span dir="ltr"><<a href="mailto:mszilagyi@gmail.com" target="_blank">mszilagyi@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm having a problem with master/slave promotion using the most recent version of the mysql ocf script hosted off the clusterLabs/resource-agents github repo.<div>


<br></div><div>The script works well failing over to a slave if a master looses connection with the cluster.  However, when the master rejoins the cluster the script is doing some undesirable things.  Basically, if the master looses connection (say I pull the network cable) then a new slave is promoted and the old master is just orphaned (which is fine, I don't have STONITH setup yet or anything).  If i plug that machine's cable back in then the node rejoins the cluster and initially there are now two masters (the old, orphaned one and the newly promoted one).  Pacemaker properly sees this and demotes the old master to a slave.  </div>


<div><br></div><div>After some time debugging the ocf I think what is happening is that the script sees the old master join and fires off a post-demote notification event for the returning master which causes a unset_master command to be executed.  This causes all the slaves to remove their master connection info.  However, since the other master server has already been promoted and is (to its mind) already replicating to the other slaves in the cluster, a new pre-promote is never fired which means that the slaves do not get a new CHANGE MASTER TO issued so I wind up with a broken replication setup.</div>


<div><br></div><div>I'm not sure if I'm missing something in how this is supposed to be working or if this is a limitation of the script.  It seems like there must be either a bug or something I've got setup wrong, however, since it's not all that unlikely that such a scenario could occur.  If anyone has any ideas or suggestions on how the script is supposed to work (or what I may be doing wrong) I'd appreciate some ideas.</div>


<div><br></div><div>I'll include the output of my crm configure show in case it'll be useful:</div><div><br></div><div><div>node $id="a1a3266d-24e2-4d1b-bfd7-de3bac929661" seven \</div><div><span style="white-space:pre-wrap">     </span>attributes 172.17.0.130-log-file-p_mysql="mysql-bin.000005" 172.17.0.130-log-pos-p_mysql="865" 172.17.0.131-log-file-p_mysql="mysql-bin.000038" 172.17.0.131-log-pos-p_mysql="607" four-log-file-p_mysql="mysql-bin.000040" four-log-pos-p_mysql="2150"</div>


<div>node $id="cc0227a2-a7bc-4a0d-ba1b-f6ecb7e7d845" four \</div><div><span style="white-space:pre-wrap">   </span>attributes 172.17.0.130-log-file-p_mysql="mysql-bin.000005" 172.17.0.130-log-pos-p_mysql="865" three-log-file-p_mysql="mysql-bin.000022" three-log-pos-p_mysql="106"</div>


<div>node $id="d9d3c6cb-bf60-4468-926f-d9716e56fb0f" three \</div><div><span style="white-space:pre-wrap">  </span>attributes 172.17.0.131-log-file-p_mysql="mysql-bin.000038" 172.17.0.131-log-pos-p_mysql="607" three-log-pos-p_mysql="4"</div>


<div>primitive p_mysql ocf:heartbeat:mysql \</div><div><span style="white-space:pre-wrap">    </span>params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf" \</div><div><span style="white-space:pre-wrap">        </span>params pid="/var/lib/mysql/mySQL.pid" socket="/var/run/mysqld/mysqld.sock" \</div>


<div><span style="white-space:pre-wrap">  </span>params replication_user="sqlSlave" replication_passwd="slave" \</div><div><span style="white-space:pre-wrap">   </span>params additional_parameters="--skip-slave-start" \</div>


<div><span style="white-space:pre-wrap">  </span>op start interval="0" timeout="120" \</div><div><span style="white-space:pre-wrap">        </span>op stop interval="0" timeout="120" \</div>


<div><span style="white-space:pre-wrap">  </span>op promote interval="0" timeout="120" \</div><div><span style="white-space:pre-wrap">      </span>op demote interval="0" timeout="120" \</div>


<div><span style="white-space:pre-wrap">  </span>op monitor interval="5" role="Master" timeout="30" \</div><div><span style="white-space:pre-wrap">       </span>op monitor interval="10" role="Slave" timeout="30"</div>


<div>ms ms_mysql p_mysql \</div><div><span style="white-space:pre-wrap">      </span>meta master-max="1" clone-max="3" target-role="Started" is-managed="true" notify="true" \</div>


<div><span style="white-space:pre-wrap">  </span>meta target-role="Started"</div><div>property $id="cib-bootstrap-options" \</div><div><span style="white-space:pre-wrap">      </span>dc-version="1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3" \</div>


<div><span style="white-space:pre-wrap">  </span>cluster-infrastructure="Heartbeat" \</div><div><span style="white-space:pre-wrap">   </span>stonith-enabled="false" \</div>

<div><span style="white-space:pre-wrap">  </span>last-lrm-refresh="1314307995"</div></div><div><br></div><div>Thanks!</div>

</blockquote></div><br></div>