[ClusterLabs] Move a resource only where another has Started

Thu Oct 7 16:40:55 EDT 2021

On Thu, 2021-10-07 at 15:45 +0000, martin doc wrote:
> Hi,
> 
> I've been trying to work out if it is possible to leave a resource on
> the cluster node that it is on and only move it to another node if a
> dependent resource is started. This is all using Red Hat's
> presentation in RHEL...
> 
> Ok, that might sound gibberish... 
> 
> The cluster config I'm trying to build starts out with a basic ping:
> 
> pcs resource create MyGw ocf:pacemaker:ping host_list=192.168.1.254
> failure_score=1 migration-threshold=1
> pcs resource clone MyGw globally-unique=true
> 
> and lets assume that there's three nodes, node1, node2, and node3 so
> the above gets pacemaker running MyGw-clone on both node1, node2, and
> node3. (3 nodes makes it more interesting ;)
> 
> All good.
> 
> Now lets add a VIP into the mix and set it to run where MyGw-clone is
> running:
> 
> pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.1.1 nic=eno3
> cidr_netmask=24
> pcs constraint colocation add VIP with MyGw-clone
> 
> All good. Now comes the fun part. I want to run my own app only on
> one node and only where the gateway is:
> 
> pcs resource create App ocf:internal:app
> pcs constraint colocation add App with VIP
> 
> If the VIP is on node1 and MyGw-clone is on node1 and running
> successfully then so too will App. The problem starts when I unplug
> eno3 (not the same NIC as is used for the cluster mgmt.)
> 
> As soon as the ping fails, MyGw-clone stops on node1 and this forces
> both VIP and App onto another node.
> 
> The problem is that pacemaker will eventually expire the failure and
> then decide to restart MyGw-clone on node1 and at the same time, stop
> VIP & App on node2. It then tries to start both VIP and App on node1.
>
> What I'd really like to happen is for VIP & App to only move back to
> node1 if and when MyGw-clone is in the "Started" state (i.e. after a
> successful ping). i.e. to only do the "Move" of VIP&APP after the
> recovery of MyGw:0 has been successful.

You don't want to colocate VIP with MyGw-clone -- or at least by itself
that is insufficient. That colocation says that the VIP can run with
any instance of MyGw-clone, not limiting it to instances that can
successfully ping.

Instead, you want to locate VIP on a node with the attribute set by the
ping agent for successful pings. In pcs syntax it will be:

 pcs constraint location VIP rule <expression>

where expression depends on the name of the attribute and what value
indicates success.

> If I set "failure_timeout" to be 0 then both App & VIP will stay put,
> and the cluster never again tests to see if MyGw:0 is healthy until I
> do a "pcs resource cleanup."
> 
> I've tried colocation rules, but I wasn't any more successful with
> those than the basic constraint configuration (assuming I got those
> right.)
> 
> I suppose another way to go about this would be to run another
> clone'd resource that mimics the ping and automatically runs a
> "resource cleanup MyGw-clone" if it notices the clone is down on and
> node and the ping would succeed. But is there a cleaner way?
> 
> Thanks,
> D.

-- 
Ken Gaillot <kgaillot at redhat.com>