[ClusterLabs] Why Do All The Services Go Down When Just One Fails?

Sat Feb 16 21:28:27 UTC 2019

On Sat, Feb 16, 2019 at 09:03:43PM +0000, Eric Robinson wrote:
> Here are the relevant corosync logs.
> 
> It appears that the stop action for resource p_mysql_002 failed, and
> that caused a cascading series of service changes. However, I don't
> understand why, since no other resources are dependent on p_mysql_002.

The stop failed because of a timeout (15s), so you can try to update
that value:

  Result of stop operation for p_mysql_002 on 001db01a: Timed Out | call=1094 key=p_mysql_002_stop_0 timeout=15000ms

After the stop failed it should have fenced that node, but you don't
have fencing configured so it tries to move mysql_002 and all the
other resources related to it (vip, fs, drbd) to the other node.
Since other mysql resources depend on the same (vip, fs, drbd) they
need to be stopped first.

-- 
Valentin