[ClusterLabs] Virtual ip resource restarted on node with down network device

Lars Ellenberg lars.ellenberg at linbit.com
Mon Sep 19 10:20:05 EDT 2016

On Mon, Sep 19, 2016 at 02:57:57PM +0200, Jan Pokorný wrote:
> On 19/09/16 09:15 +0000, Auer, Jens wrote:
> > After the restart ifconfig still shows the device bond0 to be not RUNNING:
> > MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
> > bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
> >         inet  netmask  broadcast
> >         ether a6:17:2c:2a:72:fc  txqueuelen 30000  (Ethernet)
> >         RX packets 2034  bytes 286728 (280.0 KiB)
> >         RX errors 0  dropped 29  overruns 0  frame 0
> >         TX packets 2284  bytes 355975 (347.6 KiB)
> >         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> This seems to suggest bond0 interface is up and address-assigned
> (well, the netmask is strange).  So there would be nothing
> contradictory to what I said on the address of IPaddr2.
> Anyway, you should rather be using "ip" command from iproute suite
> than various if* tools that come short in some cases:
> http://inai.de/2008/02/19
> This would also be consistent with IPaddr2 uses under the hood.

The resource agent only controlls and checks
the presence of a certain IP on a certain NIC
(and some parameters).

What you likely ended up with after the "restart"
is an "empty" bonding device with that IP assigned,
but without any "slave" devices, or at least
with the slave devices still set to link down.

If you really wanted the RA to also know about the slaves,
and be able to properly and fully configure a bonding,
you'd have to enhance that resource agent.

If you want the IP to move to some other node,
if it has connectivity problems, use a "ping" and/or
"ethmonitor" resource in addition to the IP.

If you wanted to test-drive cluster response against a
failing network device, your test was wrong.

If you wanted to test-drive cluster response against
a "fat fingered" (or even evil) operator or admin:
give up right there...
You'll never be able to cover it all :-)

: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

More information about the Users mailing list