[ClusterLabs] node went to stand-by after one single resource-failure

Oscar Salvador osalvador.vilardaga at gmail.com
Mon Jun 8 12:58:01 UTC 2015


2015-06-08 14:23 GMT+02:00 Andrei Borzenkov <arvidjaar at gmail.com>:

> On Mon, Jun 8, 2015 at 3:05 PM, Oscar Salvador
> <osalvador.vilardaga at gmail.com> wrote:
> > Hi guys!
> >
> > I've configured two nodes with the stack pacemaker + corosync, with only
> one
> > resource ( just for test purposes ), and I'm having a strange result.
> >
> > First a little bit of information:
> >
> > pacemaker version: 1.1.12-1
> > corosync version: 2.3.4-1
> >
> >
> > # crm configure show
> > node 1053402612: server1 \
> > node 1053402613: server2
> > primitive IP-rsc_apache IPaddr2 \
> > params ip=xx.xx.xx.xy nic=eth0 cidr_netmask=255.255.255.192 \
> > meta migration-threshold=2 \
> > op monitor interval=20 timeout=60 on-fail=standby
> > property cib-bootstrap-options: \
> > last-lrm-refresh=1433763004 \
> > stonith-enabled=false \
> > no-quorum-policy=ignore
> >
> ...
> >
> >
> > It seems like pacemaker is assuming that the monitor-operation failed,
> and
> > because of this, decides to mark the node as a standby. But should not
> be,
> > no?
> >
>
> You told it to do exactly that (on-fail=standby).
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



Yes, I told that: if the monitor-operation failed, put the node in standby.
But from my point of view, the monitor-operation doesn't fail, but the
resource itself.
I'm very stranged with this because as I told, I tested this with and old
version of pacemaker, and it didn't have this behaviour.
Maybe I was consufed because of that.

So, somehow is reduntant do something like that:

meta migration-threshold=2
op monitor interval=20 timeout=60 on-fail=standby

since it will never reach the failcount of 2, no?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150608/e18be356/attachment-0002.html>


More information about the Users mailing list