[ClusterLabs] Start resource only if another resource is stopped

Thu Aug 11 15:40:51 EDT 2022

On 11.08.2022 17:34, Miro Igov wrote:
> Hello,
> 
> I am trying to create failover resource that would start if another resource
> is stopped and stop when the resource is started back.
> 
> It is 4 node cluster (with qdevice) where nodes are virtual machines and two
> of them are hosted in a datacenter and the other 2 VMs in another
> datacenter.
> 
> Names of the nodes are:
> 
> nas-sync-test1
> 
> intranet-test1
> 
> nas-sync-test2
> 
> intranet-test2
> 
> The nodes ending with 1 are hosted in same datacenter and ending in 2 are in
> the other datacenter.
> 
>  
> 
> nas-sync-test* nodes are running NFS servers and exports:
> 
> nfs_server_1, nfs_export_1 (running on nas-sync-test1)
> 
> nfs_server_2, nfs_export_2 (running on nas-sync-test2)
> 
>  
> 
> intranet-test1 is running NFS mount data_1 (mounting the nfs_export_1),
> intranet-test2 is running data_2 (mounting nfs_export_2).
> 
> I created data_1_failover which is mounting the nfs_export_1 too and would
> like to be running on intranet-test2 ONLY if data_2 is down. So the idea is
> it mounts nfs_export_1 on intranet-test2 only when the local mount data_2 is
> stopped (note the nfs_server_1 runs on one datacenter and intranet-test2 in
> the another DC)
> 
> Also created data_2_failover with the same purpose as data_1_failover.
> 
>  
> 
> I would like to ask how to set the failover mounts automatically start when
> ordinary mounts stop?
> 
>  
> 
> Current configuration of the constraints:
> 
>  
> 
> tag all_mounts data_1 data_2 data_1_failover data_2_failover
> 
> tag sync_1 nfs_server_1 nfs_export_1
> 
> tag sync_2 nfs_server_2 nfs_export_2
> 
> location deny_data_1 data_1 -inf: intranet-test2
> 
> location deny_data_2 data_2 -inf: intranet-test1
> 
> location deny_failover_1 data_1_failover -inf: intranet-test1
> 
> location deny_failover_2 data_2_failover -inf: intranet-test2
> 
> location deny_sync_1 sync_1 \
> 
>         rule -inf: #uname ne nas-sync-test1
> 
> location deny_sync_2 sync_2 \
> 
>         rule -inf: #uname ne nas-sync-test2
> 
> location mount_on_intranet all_mounts \
> 
>         rule -inf: #uname eq nas-sync-test1 or #uname eq nas-sync-test2
> 
>  
> 
> colocation nfs_1 inf: nfs_export_1 nfs_server_1
> 
> colocation nfs_2 inf: nfs_export_2 nfs_server_2
> 
>  
> 
> order nfs_server_export_1 Mandatory: nfs_server_1 nfs_export_1
> 
> order nfs_server_export_2 Mandatory: nfs_server_2 nfs_export_2
> 
> order mount_1 Mandatory: nfs_export_1 data_1
> 
> order mount_1_failover Mandatory: nfs_export_1 data_1_failover
> 
> order mount_2 Mandatory: nfs_export_2 data_2
> 
> order mount_2_failover Mandatory: nfs_export_2 data_2_failover
> 
>  
> 
>  
> 
> I tried adding following colocation:
> 
>    colocation failover_1 -inf: data_2_failover data_1
> 

This colocation does not say "start data_2_failover when data_1 is
stopped". This colocation says "do not allocate data_2_failover to the
same node where data_1 is already allocated". There is difference
between "resource A can run on node N" and "resource A is active on node N".

> and it is stopping data_2_failover when data_1 is started, also it starts
> data_2_failover when data_1 is stopped - exactly as needed!
> 
> Full List of Resources:
> 
>   * admin-ip    (ocf::heartbeat:IPaddr2):        Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1        (ocf::heartbeat:exportfs):       Started
> nas-sync-test1
> 
>   * nfs_server_1        (systemd:nfs-server):    Started nas-sync-test1
> 
>   * nfs_export_2        (ocf::heartbeat:exportfs):       Started
> nas-sync-test2
> 
>   * nfs_server_2        (systemd:nfs-server):    Started nas-sync-test2
> 
>   * data_1_failover     (ocf::heartbeat:Filesystem):     Started
> intranet-test2
> 
>   * data_2_failover     (ocf::heartbeat:Filesystem):     Stopped
> 
>   * data_2      (ocf::heartbeat:Filesystem):     Started intranet-test2
> 
>   * data_1      (ocf::heartbeat:Filesystem):     Started intranet-test1
> 
>  
> 

For the future - it is much better to simply copy and paste actual
commands you used with their output. While we may guess that you used
"crm resource stop" or equivalent command, it is just a guess. Any
conclusion based on this guess will be wrong if we guessed wrong.

>  
> 
> Full List of Resources:
> 
>   * admin-ip    (ocf::heartbeat:IPaddr2):        Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1        (ocf::heartbeat:exportfs):       Started
> nas-sync-test1
> 
>   * nfs_server_1        (systemd:nfs-server):    Started nas-sync-test1
> 
>   * nfs_export_2        (ocf::heartbeat:exportfs):       Started
> nas-sync-test2
> 
>   * nfs_server_2        (systemd:nfs-server):    Started nas-sync-test2
> 
>   * data_1_failover     (ocf::heartbeat:Filesystem):     Started
> intranet-test2
> 
>   * data_2_failover     (ocf::heartbeat:Filesystem):     Started
> intranet-test1
> 
>   * data_2      (ocf::heartbeat:Filesystem):     Started intranet-test2
> 
>   * data_1      (ocf::heartbeat:Filesystem):     Stopped (disabled)
> 

Assuming you used "crm resource stop data_1" - resource data_1 cannot
run anywhere now which allows pacemaker to allocate resource
data_2_failover to node intranet-test1.

>  
> 
>  
> 
> But it does not start data_2_failover when nfs_export_1 is stopped which
> stops data_1:
> 
> Full List of Resources:
> 
>   * admin-ip    (ocf::heartbeat:IPaddr2):        Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1        (ocf::heartbeat:exportfs):       Stopped (disabled)
> 
>   * nfs_server_1        (systemd:nfs-server):    Started nas-sync-test1
> 
>   * nfs_export_2        (ocf::heartbeat:exportfs):       Started
> nas-sync-test2
> 
>   * nfs_server_2        (systemd:nfs-server):    Started nas-sync-test2
> 
>   * data_1_failover     (ocf::heartbeat:Filesystem):     Stopped
> 
>   * data_2_failover     (ocf::heartbeat:Filesystem):     Stopped
> 
>   * data_2      (ocf::heartbeat:Filesystem):     Started intranet-test2
> 
>   * data_1      (ocf::heartbeat:Filesystem):     Stopped
> 

And here there is no restriction for *placement* of data_1 which means
pacemaker allocated data_1 to the node intranet-test1. This resource is
not *active* due to ordering requirements - it cannot be started before
another resource is started - still, it is assigned to the cluster node
and colocation prohibits assignment of data_2_failover to the same node.
Pacemaker will wait (infinitely) for possibility to start data_1 on the
allocated node.

One possibility to do what you want is node attribute. Either resource
agent can set unique node attribute when resource becomes active or you
can use ocf:pacemaker:attribute. As a proof of concept:

primitive data_1_active ocf:pacemaker:attribute \
        params active_value=1 inactive_value=0 \
        op monitor interval=10s timeout=20s
colocation attribute_1 inf: data_1_active data_1
order data_1_active_after_data_1 Mandatory: data_1 data_1_active
location data_2_failover_if_data_1_inactive data_2_failover \
        rule -inf: defined opa-data_1_active and opa-data_1_active eq 1