[ClusterLabs] Why Do All The Services Go Down When Just One Fails?
Andrei Borzenkov
arvidjaar at gmail.com
Tue Feb 19 23:13:23 EST 2019
19.02.2019 23:06, Eric Robinson пишет:
...
> Bottom line is, how do we configure the cluster in such a way that
> there are no cascading circumstances when a MySQL resource fails?
> Basically, if a MySQL resource fails, it fails. We'll deal with that
> on an ad-hoc basis. I don't want the whole cluster to barf.
...
> This is probably a dumb question, but can we remove just the monitor operation but leave the resource configured in the cluster? If a node fails over, we do want the resources to start automatically on the new primary node.
While you can do it, the problem discussed in this thread was caused by
failure to stop resource, not resource failure during normal operation.
Logs you provided started with
Feb 16 14:06:24 [3908] 001db01a cib: info: cib_perform_op:
+ /cib: @epoch=346, @num_updates=0
Feb 16 14:06:24 [3908] 001db01a cib: info: cib_perform_op:
++ /cib/configuration/resources/primitive[@id='p_mysql_002']:
<meta_attributes id="p_mysql_002-meta_attributes"/>
Feb 16 14:06:24 [3908] 001db01a cib: info: cib_perform_op:
++ <nvpair
id="p_mysql_002-meta_attributes-target-role" name="target-role"
value="Stopped"/>
Feb 16 14:06:24 [3908] 001db01a cib: info: cib_perform_op:
++
</meta_attributes>
so apparently administrator decided to stop this MySQL instance (I am
not sure if pacemaker keeps or logs origin of CIB change or if it is
even possible to determine it).
So removing monitor operation would not help with this. You probably
still need to set on-failure=ignore for each operation on MySQL
resources to get desired behavior.
More information about the Users
mailing list