[ClusterLabs] IPaddr2 Unkown interface cause a failover that didn't work

Thu Oct 1 18:21:19 UTC 2015

2015-10-01 9:30 GMT-04:00 Dejan Muhamedagic <dejanmm at fastmail.fm>:

> Hi,
>
> On Wed, Sep 30, 2015 at 02:24:32PM -0400, Luc Paulin wrote:
> > Hi Everyone,
> > I have experience a weird issue last night where our cluster try to
> > failover due to an "Unkown interface"
> >
> > Look like when the IPaddr2 monitor try to perform a status on eth0, it
> > didn't find the device. Both node are VM. I haven't found any reason as
> why
> > eth0 would have "disapear"
> >
> > <LOG NODE1>
> > [...]
> > Sep 29 21:25:06 node-02 pengine[3240]:    error: unpack_rsc_op:
> Preventing
> > vip_v207_174 from re-starting anywhere: operation monitor failed 'not
> > configured' (6)
>
> The RA exits with the error code which says that the resource
> configuration is invalid. Hence PE won't try to start that
> resource again. Normally, we don't expect network interfaces to
> disappear, but this should probably be the "not installed" error,
> so that the resource can be started on another node. Or even the
> "generic" error in case it may be expected that interfaces can
> come and go. Did you figure why the interface disappeared?
>
>
No we haven't been able to figure out why the interface disappeared.
Actually it doesn't seem to have disappeared as we have no evidence that
interface was gone from kernel log.  As you say this should probably have
be in the "not intstalled" or "generic" error so it tries to start it on
another node, but obviously, network interface that disapear is not
something that we expect to see.


> Thanks,
>
> Dejan
>
> > I know that I found some post that say to run sysctl -w
> > net.ipv4.conf.all.promote_secondaries=1 to avoid secondary nic to be
> remove
> > when primary is gone, but in this case the eth0 has a single nic that is
> > manage through IPaddr2 within crm configuration
> >
> > Here's the configuration or node:
> >
> > <CONFIGURATION>
> > Cluster Name: nodecluster1
> > Corosync Nodes:
> >  node-01 node-02
> > Pacemaker Nodes:
> >  node-01 node-02
> >
> > Resources:
> >  Group: lbpcivip
> >   Resource: vip_v207_174 (class=ocf provider=heartbeat type=IPaddr2)
> >    Attributes: ip=x.x.x.174 cidr_netmask=27 broadcast=x.x.x.191 nic=eth0
> >    Operations: monitor interval=10s (vip_v207_174-monitor-interval-10s)
> >   Resource: vip_v26_1 (class=ocf provider=heartbeat type=IPaddr2)
> >    Attributes: ip=x.x.26.1
> >    Operations: monitor interval=10s (vip_v26_1-monitor-interval-10s)
> >   Resource: vip_v27_1 (class=ocf provider=heartbeat type=IPaddr2)
> >    Attributes: ip=x.x.27.1
> >    Operations: monitor interval=10s (vip_v27_1-monitor-interval-10s)
> >   Resource: vip_v254_230 (class=ocf provider=heartbeat type=IPaddr2)
> >    Attributes: ip=x.x.254.230
> >    Operations: monitor interval=10s (vip_v254_230-monitor-interval-10s)
> >   Resource: change-default-fw (class=lsb type=fwdefaultgw)
> >    Operations: monitor interval=60s
> (change-default-fw-monitor-interval-60s)
> >   Resource: fwcorp-mailto-sysadmin (class=ocf provider=heartbeat
> > type=MailTo)
> >    Attributes: email=its at touchtunes.com subject="[node - Clustered
> > services]"
> >    Operations: monitor interval=60s
> > (fwcorp-mailto-sysadmin-monitor-interval-60s)
> >
> > Stonith Devices:
> > Fencing Levels:
> >
> > Location Constraints:
> > Ordering Constraints:
> > Colocation Constraints:
> >
> > Cluster Properties:
> >  cluster-infrastructure: cman
> >  dc-version: 1.1.11-97629de
> >  last-lrm-refresh: 1412269491
> >  no-quorum-policy: ignore
> >  stonith-enabled: false
> > </CONFIGURATION>
> >
> > Has anyone have suggestion on how I can solve this issue? Why did the
> > failover from node1 to node2 didn't work ?
> >
> > If more information is require let me know, any suggestion would be
> > appreciated!
> >
> > Thanx!
> >
> >
> > --
> >                          !!!!!
> >                        ( o o )
> >  --------------oOO----(_)----OOo--------------
> >    Luc Paulin
> >    email: paulinster(at)gmail.com
> >    Skype: paulinster
>
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20151001/252b96b7/attachment.htm>