[ClusterLabs] [EXT] Prevent cluster transition when resource unavailable on both nodes

Mon Dec 11 04:21:17 EST 2023

Hi,

Thanks Ken and Ulrich for your replies. With your suggestions I ended up finding out about ocf:heartbeat:ethmonitor and will try to set this up as an additional resource within our cluster.

I can share more information once (if!) I have it working the way I want to.

Cheers,

Alex

> On 07.12.2023, at 08:59, Windl, Ulrich <u.windl at ukr.de> wrote:
> 
> Hi!
> 
> What about this: Run a ping node for a remote resource to set up some score value. If the remote is unreachable, the score will reflect that.
> Then add a rule chink that score, deciding whether to run the virtual IP or not.
> 
> Regards,
> Ulrich
> 
> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Alexander Eastwood
> Sent: Wednesday, December 6, 2023 5:56 PM
> To: users at clusterlabs.org
> Subject: [EXT] [ClusterLabs] Prevent cluster transition when resource unavailable on both nodes
> 
> Hello, 
> 
> I administrate a Pacemaker cluster consisting of 2 nodes, which are connected to each other via ethernet cable to ensure that they are always able to communicate with each other. A network switch is also connected to each node via ethernet cable and provides external access.
> 
> One of the managed resources of the cluster is a virtual IP, which is assigned to a physical network interface card and thus depends on the network switch being available. The virtual IP is always hosted on the active node.
> 
> We had the situation where the network switch lost power or was rebooted, as a result both servers reported `NIC Link is Down`. The recover operation on the Virtual IP resource then failed repeatedly on the active node, and a transition was initiated. Since the other node was also unable to start the resource, the cluster was swaying between the 2 nodes until the NIC links were up again.
> 
> Is there a way to change this behaviour? I am thinking of the following sequence of events, but have not been able to find a way to configure this:
> 
> 1. active node detects NIC Link is Down, which affects a resource managed by the cluster (monitor operation on the resource starts to fail)
> 2. active node checks if the other (passive) node in the cluster would be able to start the resource
> 3. if passive node can start the resource, transition all resources to passive node
> 4. if passive node is unable to start the resource, then there is nothing to be gained a transition, so no action should be taken
> 
> Any pointers or advice will be much appreciated!
> 
> Thank you and kind regards,
> 
> Alex Eastwood
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/