[ClusterLabs] Debugging problems with resource timeout without any actions from cluster

Thu Oct 12 07:13:59 EDT 2017

Hello,
I experience some strange problem on MySQL resource agent from Percona:
sometimes monitor operation for it killed by lrmd due to timeout, like
this:

Oct 12 12:26:46 sde1 lrmd[14812]:  warning: p_mysql_monitor_5000 process (PID 28991) timed out
Oct 12 12:27:15 sde1 lrmd[14812]:  warning: p_mysql_monitor_5000:28991 - timed out after 20000ms
Oct 12 12:27:15 sde1 crmd[14815]:    error: Result of monitor operation for p_mysql on sde1: Timed Out

Now I investigate the problem, but trouble is that no extraordinary DB
load or something else like that was detected. But, when those timeouts
happen, Pacemaker tries to move MySQL (and all resources colocated with
it) to other node (I have two-noded cluster). For some reasons I have
other node in standby mode now, and Pacemaker move resources back,
restarting them. All this moving/restarting leads our services to be
unavailable for some time, and this is unwanted.

So, my purpose is to get cluster with MySQL and other colocated
resources up, but only with resource monitoring, and without starting,
stopping, promoting, demoting resources, etc.

I found several ways to achieve that:

1. Put cluster in maintainance mode (as described here:
   https://www.hastexo.com/resources/hints-and-kinks/maintenance-active-pacemaker-clusters/)

   As far as I understand, services will be monitored, all logs written,
   etc., but no action in case of failures will be taken. Is that right?

2. Put the particular resource to unmanaged mode, as described here:
   http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#s-monitoring-unmanaged

3. Start all resources and remove start and stop operations from them.

Which is the best way to achieve my purpose? I would like cluster to run
as usual (and logging as usual or with trace on problematic resource),
but no action in case of monitor failure should be taken.

Here is the configuration of MySQL resource:

primitive p_mysql ocf:percona:mysql \
        params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" replication_user=slave_user replication_passwd=password max_slave_lag=180 evict_outdated_slaves=false binary="/usr/sbin/mysqld" test_user=test test_passwd=test \
        op start interval=0 timeout=60s \
        op stop interval=0 timeout=60s \
        op monitor interval=5s role=Master OCF_CHECK_LEVEL=1 \
        op monitor interval=2s role=Slave OCF_CHECK_LEVEL=1

-- 
Bright regards, Sergey Korobitsin,
Arta Software, http://arta.kz/
xmpp:undertaker at jabber.arta.kz

--
Re: Вышел Lazarus 0.9.28
> компилятором fpc поддерживается тип variant, можно грабить корованы.
который жутко неудобен и создает ощущение, что корованы грабят тебя.
	-- anonymous @ linux.org.ru