<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=HU link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal><span lang=EN-US>Dear community,<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>A few days ago we had an issue in our Mysql M/S replication cluster.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>We have a one R/W Master, and a one RO Slave setup. RO VIP is supposed to be running on the slave if it is not too much behind the master, and if any error occurs, RO VIP is moved to the master.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Something happened with the slave Mysql (some disk issue, still investigating), but the problem is, that the slave VIP remained on the slave device, even though the slave process was not running, and the server was much outdated.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>During the issue the following log entries appeared (just an extract as it would be too long):<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:04:07 ctdb1 corosync[1056]: [MAIN ] Corosync main process was not scheduled for 14088.5488 ms (threshold is 4000.0000 ms). Consider token timeout increase.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:04:07 ctdb1 corosync[1056]: [TOTEM ] A processor failed, forming new configuration.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:04:34 ctdb1 corosync[1056]: [MAIN ] Corosync main process was not scheduled for 27065.2559 ms (threshold is 4000.0000 ms). Consider token timeout increase.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:04:34 ctdb1 corosync[1056]: [TOTEM ] A new membership (xxx:6720) was formed. Members left: 168362243 168362281 168362282 168362301 168362302 168362311 168362312 1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:04:34 ctdb1 corosync[1056]: [TOTEM ] A new membership (xxx:6724) was formed. Members<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>..<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:28 ctdb1 corosync[1056]: [MAIN ] Completed service synchronization, ready to provide service.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>..<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:29 ctdb1 attrd[1584]: notice: attrd_trigger_update: Sending flush op to all hosts for: readable (1)<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>…<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 mysql(db-mysql)[10492]: INFO: post-demote notification for ctdb1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-master)[10490]: INFO: IP status = ok, IP_CIP=<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 crmd[1586]: notice: process_lrm_event: LRM operation db-ip-master_stop_0 (call=371, rc=0, cib-update=179, confirmed=true) ok<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-slave)[10620]: INFO: Adding inet address xxx/24 with broadcast address xxxx to device eth0<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-slave)[10620]: INFO: Bringing device eth0 up<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 IPaddr2(db-ip-slave)[10620]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /usr/var/run/resource-agents/send_arp-xxx eth0 xxx auto not_used not_used<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 crmd[1586]: notice: process_lrm_event: LRM operation db-ip-slave_start_0 (call=377, rc=0, cib-update=180, confirmed=true) ok<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 crmd[1586]: notice: process_lrm_event: LRM operation db-ip-slave_monitor_20000 (call=380, rc=0, cib-update=181, confirmed=false) ok<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 crmd[1586]: notice: process_lrm_event: LRM operation db-mysql_notify_0 (call=374, rc=0, cib-update=0, confirmed=true) ok<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 attrd[1584]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-db-mysql (1)<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 attrd[1584]: notice: attrd_perform_update: Sent update 1622: master-db-mysql=1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:32 ctdb1 crmd[1586]: notice: process_lrm_event: LRM operation db-mysql_demote_0 (call=384, rc=0, cib-update=182, confirmed=true) ok<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 mysql(db-mysql)[11160]: INFO: Ignoring post-demote notification for my own demotion.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 crmd[1586]: notice: process_lrm_event: LRM operation db-mysql_notify_0 (call=387, rc=0, cib-update=0, confirmed=true) ok<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 mysql(db-mysql)[11185]: ERROR: check_slave invoked on an instance that is not a replication slave.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 crmd[1586]: notice: process_lrm_event: LRM operation db-mysql_monitor_7000 (call=390, rc=0, cib-update=183, confirmed=false) ok<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 ntpd[1560]: Listen normally on 16 eth0 xxxx. UDP 123<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 ntpd[1560]: Deleting interface #12 eth0, xxx#123, interface stats: received=0, sent=0, dropped=0, active_time=2637334 secs<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 ntpd[1560]: peers refreshed<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:33 ctdb1 ntpd[1560]: new interface(s) found: waking up resolver<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:40 ctdb1 mysql(db-mysql)[11224]: ERROR: check_slave invoked on an instance that is not a replication slave.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Aug 20 02:13:47 ctdb1 mysql(db-mysql)[11263]: ERROR: check_slave invoked on an instance that is not a replication slave.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>And from this time, the last two lines repeat every 7 seconds (mysql monitoring interval)<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>The expected behavior was that the slave (RO) VIP should have been moved to the master, as the secondary db was outdated.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Unfortunately I cannot recall what crm_mon was showing when the issue was present, but I am sure that the RA did not handle the situation properly.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Placing the slave node into standby and the online resolved the issue immediately (Slave started to sync, and in a few minutes it catched up the master).<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Here is the relevant config from the configuration:<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>primitive db-ip-master ocf:heartbeat:IPaddr2 \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> params lvs_support="true" ip="XXX" cidr_netmask="24" broadcast="XXX" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op start interval="0" timeout="20s" on-fail="restart" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op monitor interval="20s" timeout="20s" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op stop interval="0" timeout="20s" on-fail="block"<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>primitive db-ip-slave ocf:heartbeat:IPaddr2 \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> params lvs_support="true" ip="XXX" cidr_netmask="24" broadcast="XXX" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op start interval="0" timeout="20s" on-fail="restart" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op monitor interval="20s" timeout="20s" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op stop interval="0" timeout="20s" on-fail="block" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> meta target-role="Started"<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>primitive db-mysql ocf:heartbeat:mysql \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" test_passwd="XXX" test_table="XXX" test_user="XXX" replication_user="XXX" replication_passwd="XXX" additional_parameters="--skip-slave-start" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op start interval="0" timeout="240s" on-fail="restart" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op stop interval="0" timeout="120s" on-fail="block" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op monitor interval="7" timeout="30s" on-fail="restart" OCF_CHECK_LEVEL="1" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op promote interval="0" timeout="120" on-fail="restart" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> op demote interval="0" timeout="120" on-fail="block"<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>ms mysql db-mysql \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> meta notify="true" master-max="1" clone-max="2" target-role="Started" is-managed="true"<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>location db-ip-m-1 db-ip-master 0: ctdb1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>location db-ip-m-2 db-ip-master 0: ctdb2<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>location db-ip-s-1 db-ip-slave 0: ctdb1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>location db-ip-s-2 db-ip-slave 0: ctdb2<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>location db-ip-s-readable db-ip-slave \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> rule $id="rule-no-reader-slave" -inf: readable lt 1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>location db-mysql-loc-1 mysql 100: ctdb1<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>location db-mysql-loc-2 mysql 100: ctdb2<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>colocation db-ip-slave-master -50: db-ip-slave db-ip-master<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>colocation db-ip-with-master inf: db-ip-master mysql:Master<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>colocation db-slave-on-db inf: db-ip-slave mysql<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>order master-after-db inf: mysql db-ip-master<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>order slave-after-db inf: mysql db-ip-slave<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>property $id="cib-bootstrap-options" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> dc-version="1.1.10-42f2063" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> cluster-infrastructure="corosync" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> symmetric-cluster="false" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> cluster-recheck-interval="2m" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> no-quorum-policy="stop" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> stop-orphan-resources="false" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> start-failure-is-fatal="false" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> maintenance-mode="false"<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>property $id="mysql_replication" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> db-mysql_REPL_INFO="ctdb2|mysql-bin.002928|107"<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>rsc_defaults $id="rsc-options" \<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> resource-stickiness="0"<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Do you have any hints what could have gone wrong, and how we could avoid such issues in the future?<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Versions:<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Ubuntu Trusty Tahr<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Pacemaker 1.1.10<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Corosync 2.3.3<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>Resource agents 3.9.3<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Thanks a lot in advance,<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Attila<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p></div></body></html>