[ClusterLabs] Move a resource only where another has Started

Thu Oct 7 11:45:06 EDT 2021

Hi,

I've been trying to work out if it is possible to leave a resource on the cluster node that it is on and only move it to another node if a dependent resource is started. This is all using Red Hat's presentation in RHEL...

Ok, that might sound gibberish...

The cluster config I'm trying to build starts out with a basic ping:

pcs resource create MyGw ocf:pacemaker:ping host_list=192.168.1.254 failure_score=1 migration-threshold=1
pcs resource clone MyGw globally-unique=true

and lets assume that there's three nodes, node1, node2, and node3 so the above gets pacemaker running MyGw-clone on both node1, node2, and node3. (3 nodes makes it more interesting ;)

All good.

Now lets add a VIP into the mix and set it to run where MyGw-clone is running:

pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.1.1 nic=eno3 cidr_netmask=24
pcs constraint colocation add VIP with MyGw-clone

All good. Now comes the fun part. I want to run my own app only on one node and only where the gateway is:

pcs resource create App ocf:internal:app
pcs constraint colocation add App with VIP

If the VIP is on node1 and MyGw-clone is on node1 and running successfully then so too will App. The problem starts when I unplug eno3 (not the same NIC as is used for the cluster mgmt.)

As soon as the ping fails, MyGw-clone stops on node1 and this forces both VIP and App onto another node.

The problem is that pacemaker will eventually expire the failure and then decide to restart MyGw-clone on node1 and at the same time, stop VIP & App on node2. It then tries to start both VIP and App on node1.

What I'd really like to happen is for VIP & App to only move back to node1 if and when MyGw-clone is in the "Started" state (i.e. after a successful ping). i.e. to only do the "Move" of VIP&APP after the recovery of MyGw:0 has been successful.

If I set "failure_timeout" to be 0 then both App & VIP will stay put, and the cluster never again tests to see if MyGw:0 is healthy until I do a "pcs resource cleanup."

I've tried colocation rules, but I wasn't any more successful with those than the basic constraint configuration (assuming I got those right.)

I suppose another way to go about this would be to run another clone'd resource that mimics the ping and automatically runs a "resource cleanup MyGw-clone" if it notices the clone is down on and node and the ping would succeed. But is there a cleaner way?

Thanks,
D.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20211007/97435b5a/attachment.htm>