[ClusterLabs] Virtual ip resource restarted on node with down network device

Fri Sep 16 11:43:00 EDT 2016

Hi,

thanks for the help.

> I'm not sure what you mean by "the device the virtual ip is attached
> to", but a separate question is why the resource agent reported that
> restarting the IP was successful, even though that device was
> unavailable. If the monitor failed when the device was made unavailable,
> I would expect the restart to fail as well.

I created the virtual ip with parameter nic=bond0, and this is the device I am bringing down
and was referring to in my question. I think the current behavior is a little inconsistent. I bring 
down the device and pacemaker recognizes this and restarts the resource. However, the monitor
then should fail again, but it just doesn't detect any problems. 

Cheers,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.auer at cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI Group Inc. and its affiliates may be contained in this message. If you are not a recipient indicated or intended in this message (or responsible for delivery of this message to such person), or you think for any reason that this message may have been addressed to you in error, you may not use or copy or deliver this message to anyone else. In such case, you should destroy this message and are asked to notify the sender by reply e-mail.

________________________________________
Von: Ken Gaillot [kgaillot at redhat.com]
Gesendet: Freitag, 16. September 2016 17:27
An: users at clusterlabs.org
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

On 09/16/2016 10:08 AM, Auer, Jens wrote:
> Hi,
>
> I have configured an Active/Passive cluster to host a virtual ip
> address. To test failovers, I shutdown the device the virtual ip is
> attached to and expected that it moves to the other node. However, the
> virtual ip is detected as FAILED, but is then restarted on the same
> node. I was able to solve this by using a ping resource which we want to
> do anyway, but I am wondering why the resource is restarted on the node
> and no failure is detected anymore.

If a *node* fails, pacemaker will recover all its resources elsewhere,
if possible.

If a *resource* fails but the node is OK, the response is configurable,
via the "on-fail" operation option and "migration-threshold" resource
option.

By default, on-fail=restart for monitor operations, and
migration-threshold=INFINITY. This means that if a monitor fails,
pacemaker will attempt to restart the resource on the same node.

To get an immediate failover of the resource, set migration-threshold=1
on the resource.

I'm not sure what you mean by "the device the virtual ip is attached
to", but a separate question is why the resource agent reported that
restarting the IP was successful, even though that device was
unavailable. If the monitor failed when the device was made unavailable,
I would expect the restart to fail as well.

>
> On my setup, this is very easy to reproduce:
> 1. Start cluster with virtual ip
> 2. On the node hosting the virtual ip, bring down the network device
> with ifdown
> => The resource is detected as failed
> => The resource is restarted
> => No failures are dected from now on
>
> Best wishes,
>   Jens
>
> --
> *Jens Auer *| CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> _jens.auer at cgi.com_ <mailto:jens.auer at cgi.com>
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie
> unter _de.cgi.com/pflichtangaben_ <http://de.cgi.com/pflichtangaben>.\

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org