[ClusterLabs] IPaddr2 Unkown interface cause a failover that didn't work

Thu Oct 1 13:30:48 UTC 2015

Hi,

On Wed, Sep 30, 2015 at 02:24:32PM -0400, Luc Paulin wrote:
> Hi Everyone,
> I have experience a weird issue last night where our cluster try to
> failover due to an "Unkown interface"
> 
> Look like when the IPaddr2 monitor try to perform a status on eth0, it
> didn't find the device. Both node are VM. I haven't found any reason as why
> eth0 would have "disapear"
> 
> <LOG NODE1>
> [...]
> Sep 29 21:25:06 node-02 pengine[3240]:    error: unpack_rsc_op: Preventing
> vip_v207_174 from re-starting anywhere: operation monitor failed 'not
> configured' (6)

The RA exits with the error code which says that the resource
configuration is invalid. Hence PE won't try to start that
resource again. Normally, we don't expect network interfaces to
disappear, but this should probably be the "not installed" error,
so that the resource can be started on another node. Or even the
"generic" error in case it may be expected that interfaces can
come and go. Did you figure why the interface disappeared?

Thanks,

Dejan

> I know that I found some post that say to run sysctl -w
> net.ipv4.conf.all.promote_secondaries=1 to avoid secondary nic to be remove
> when primary is gone, but in this case the eth0 has a single nic that is
> manage through IPaddr2 within crm configuration
> 
> Here's the configuration or node:
> 
> <CONFIGURATION>
> Cluster Name: nodecluster1
> Corosync Nodes:
>  node-01 node-02
> Pacemaker Nodes:
>  node-01 node-02
> 
> Resources:
>  Group: lbpcivip
>   Resource: vip_v207_174 (class=ocf provider=heartbeat type=IPaddr2)
>    Attributes: ip=x.x.x.174 cidr_netmask=27 broadcast=x.x.x.191 nic=eth0
>    Operations: monitor interval=10s (vip_v207_174-monitor-interval-10s)
>   Resource: vip_v26_1 (class=ocf provider=heartbeat type=IPaddr2)
>    Attributes: ip=x.x.26.1
>    Operations: monitor interval=10s (vip_v26_1-monitor-interval-10s)
>   Resource: vip_v27_1 (class=ocf provider=heartbeat type=IPaddr2)
>    Attributes: ip=x.x.27.1
>    Operations: monitor interval=10s (vip_v27_1-monitor-interval-10s)
>   Resource: vip_v254_230 (class=ocf provider=heartbeat type=IPaddr2)
>    Attributes: ip=x.x.254.230
>    Operations: monitor interval=10s (vip_v254_230-monitor-interval-10s)
>   Resource: change-default-fw (class=lsb type=fwdefaultgw)
>    Operations: monitor interval=60s (change-default-fw-monitor-interval-60s)
>   Resource: fwcorp-mailto-sysadmin (class=ocf provider=heartbeat
> type=MailTo)
>    Attributes: email=its at touchtunes.com subject="[node - Clustered
> services]"
>    Operations: monitor interval=60s
> (fwcorp-mailto-sysadmin-monitor-interval-60s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> 
> Cluster Properties:
>  cluster-infrastructure: cman
>  dc-version: 1.1.11-97629de
>  last-lrm-refresh: 1412269491
>  no-quorum-policy: ignore
>  stonith-enabled: false
> </CONFIGURATION>
> 
> Has anyone have suggestion on how I can solve this issue? Why did the
> failover from node1 to node2 didn't work ?
> 
> If more information is require let me know, any suggestion would be
> appreciated!
> 
> Thanx!
> 
> 
> --
>                          !!!!!
>                        ( o o )
>  --------------oOO----(_)----OOo--------------
>    Luc Paulin
>    email: paulinster(at)gmail.com
>    Skype: paulinster

> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org