[ClusterLabs] Questions about pacemaker/ mysql resource agent behaviour when network fail
Andrei Borzenkov
arvidjaar at gmail.com
Sat Oct 6 00:13:10 EDT 2018
05.10.2018 15:00, Simon Bomm пишет:
> Hi all,
>
> Using pacemaker 1.1.18-11 and mysql resource agent (
> https://github.com/ClusterLabs/resource-agents/blob/RHEL6/heartbeat/mysql),
> I run into an unwanted behaviour. My point of view of course, maybe it's
> expected to be as it is that's why I ask.
>
> # My test case is the following :
>
> Everything is OK on my cluster, crm_mon output is as below (no failed
> actions)
>
> Master/Slave Set: ms_mysql-master [ms_mysql]
> Masters: [ db-master ]
> Slaves: [ db-slave ]
>
> 1. I insert in a table on master, no issue data is replicated.
> 2. I shut down net int on the master (vm),
What exactly does it mean? How do you shut down net?
> pacemaker correctly start on the
> other node. Master is seen as offline, and db-slave is now master
>
> Master/Slave Set: ms_mysql-master [ms_mysql]
> Masters: [ db-slave ]
>
> 3. I bring back my net int up, pacemaker see the node online and set the
> old-master as a the new slave :
>
> Master/Slave Set: ms_mysql-master [ms_mysql]
> Masters: [ db-slave ]
> Slaves: [ db-master ]
>
> 4. From this point, my external monitoring bash script shows that SQL and
> IO thread are not running, but I can't see any error in the pcs
> status/crm_mon outputs.
Pacemaker just shows what resource agents claim. If resource agent
claims resource is started, there is nothing pacemaker can do. You need
to debug what resource agent does.
> Consequence is that I continue inserting on my new
> promoted master but the data is never consumed by my former master computer.
>
> # Questions :
>
> - Is this some kind of safety behaviour to avoid data corruption when a
> node is back online ?
> - When I want to manually start it like ocf does it returns this error :
>
> mysql -h localhost -u user-repl -pmysqlreplpw -e "START SLAVE"
> ERROR 1200 (HY000) at line 1: Misconfigured slave: MASTER_HOST was not set;
> Fix in config file or with CHANGE MASTER TO
>
> - I would expect the cluster to stop the slave and show a failed action, am
> I wrong here ?
>
I am not familiar with specific application and its structure. From
quick browsing monitor action does mostly check for running process. Is
mySQL process running?
> # Other details (not sure it matters a lot)
>
> No stonith enabled, no fencing or auto-failback.
How are you going to resolve split-brain without stonith? "Stopping net"
sounds exactly like split brain, in which case further investigation is
rather pointless.
Anyway, to give some non-hypothetical answer full configuration and logs
from both systems are needed.
> Symetric cluster
> configured.
>
> Details of my pacemaker resource configuration is
>
> Master: ms_mysql-master
> Meta Attrs: master-node-max=1 clone_max=2 globally-unique=false
> clone-node-max=1 notify=true
> Resource: ms_mysql (class=ocf provider=heartbeat type=mysql)
> Attributes: binary=/usr/bin/mysqld_safe config=/etc/my.cnf.d/server.cnf
> datadir=/var/lib/mysql evict_outdated_slaves=false max_slave_lag=15
> pid=/var/lib/mysql/mysql.pid replication_passwd=mysqlreplpw
> replication_user=user-repl socket=/var/lib/mysql/mysql.sock
> test_passwd=mysqlrootpw test_user=root
> Operations: demote interval=0s timeout=120 (ms_mysql-demote-interval-0s)
> monitor interval=20 timeout=30 (ms_mysql-monitor-interval-20)
> monitor interval=10 role=Master timeout=30
> (ms_mysql-monitor-interval-10)
> monitor interval=30 role=Slave timeout=30
> (ms_mysql-monitor-interval-30)
> notify interval=0s timeout=90 (ms_mysql-notify-interval-0s)
> promote interval=0s timeout=120
> (ms_mysql-promote-interval-0s)
> start interval=0s timeout=120 (ms_mysql-start-interval-0s)
> stop interval=0s timeout=120 (ms_mysql-stop-interval-0s)
>
> Any things I'm missing on this ? Did not find a clearly similar usecase
> when googling around network outage and pacemaker.
>
> Thanks
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list