[Pacemaker] Long failover

Andrei Borzenkov arvidjaar at gmail.com
Fri Nov 14 13:12:28 UTC 2014

On Fri, Nov 14, 2014 at 2:57 PM, Dmitry Matveichev
<d.matveichev at mfisoft.ru> wrote:
> Hello,
> We have a cluster configured via pacemaker+corosync+crm. The configuration
> is:
> node master
> node slave
> primitive HA-VIP1 IPaddr2 \
>         params ip= nic=bond0 \
>         op monitor interval=1s
> primitive HA-variator lsb: variator \
>         op monitor interval=1s \
>         meta migration-threshold=1 failure-timeout=1s
> group HA-Group HA-VIP1 HA-variator
> property cib-bootstrap-options: \
>         dc-version=1.1.10-14.el6-368c726 \
>         cluster-infrastructure="classic openais (with plugin)" \
>         expected-quorum-votes=2 \
>         stonith-enabled=false \
>        no-quorum-policy=ignore \
>         last-lrm-refresh=1383871087
> rsc_defaults rsc-options: \
>         resource-stickiness=100
> Firstly I make the variator service down  on the master node (actually I
> delete the service binary and kill the variator process, so the variator
> fails to restart). Resources very quickly move on the slave node as
> expected. Then I return the binary on the master and restart the variator
> service. Now I make the same stuff with binary and service on slave node.
> The crm status command quickly shows me HA-variator   (lsb: variator):
> Stopped. But it take to much time (for us) before recourses are switched on
> the master node (around 1 min).   Then line
> Failed actions:
>     HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1,
> status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013', queued=0ms,
> exec=0ms
> appears in the crm status and recourses are switched.
> What is that timeout? Where I can change it?

This is operation timeout. You can change it in operation definition:
op monitor interval=1s timeout=5s

More information about the Pacemaker mailing list