[ClusterLabs] Antw: [EXT] Cluster Removing VIP and Not Following Order Constraint

Strahil Nikolov hunter86_bg at yahoo.com
Fri Feb 11 05:13:41 EST 2022

Ah, it's a HANA.
Last HANA I did had something like this:
colocation  constraint-> VIP with Master HANAorder constraint -> First HANA clone (don't specify master role) -> then IP
That way ,when the standby HANA joins and the master is demoted (kind of challanged) and afterwards samo old primary is promoted back, the IP never disappeared while it always started on the correct side (where the master is).

Best Regards,Strahil Nikolov 
  On Fri, Feb 11, 2022 at 10:38, Jonno<jstk888 at gmail.com> wrote:   Hello all,
Thank you for your assistance.
Below is the config from my lab environment. By the way, I just tried Strahil's suggestions, but it didn't seem to have any effect on the behaviour.
node 1: senzhana3 \
        attributes hana_abc_op_mode=logreplay hana_abc_vhost=senzhana3 hana_abc_site=SITEA hana_abc_srmode=sync lpa_abc_lpt=10 hana_abc_remoteHost=senzhana4
node 2: senzhana4 \
        attributes lpa_abc_lpt=1644568482 hana_abc_op_mode=logreplay hana_abc_vhost=senzhana4 hana_abc_site=SITEB hana_abc_srmode=sync hana_abc_remoteHost=senzhana3
primitive rsc_SAPHanaTopology_ABC_HDB96 ocf:suse:SAPHanaTopology \
        operations $id=rsc_sap2_ABC_HDB96-operations \
        op monitor interval=10 timeout=600 \
        op start interval=0 timeout=600 \
        op stop interval=0 timeout=300 \
        params SID=ABC InstanceNumber=96
primitive rsc_SAPHana_ABC_HDB96 ocf:suse:SAPHana \
        operations $id=rsc_sap_ABC_HDB96-operations \
        op start interval=0 timeout=3600 \
        op stop interval=0 timeout=3600 \
        op promote interval=0 timeout=3600 \
        op monitor interval=60 role=Master timeout=700 \
        op monitor interval=61 role=Slave timeout=700 \
primitive rsc_hsr_quiesce lsb:hsr_quiesce
primitive rsc_hsr_resume lsb:hsr_resume
primitive rsc_ip_ABC_HDB96 IPaddr2 \
        operations $id=rsc_ip_ABC_HDB96-operations \
        op monitor interval=10s timeout=20s \
        params ip=
primitive stonith-sbd stonith:external/sbd \
        params pcmk_delay_max=30 \
        meta target-role=Started
ms msl_SAPHana_ABC_HDB96 rsc_SAPHana_ABC_HDB96 \
        meta clone-max=2 clone-node-max=1 interleave=true target-role=Master
clone cln_SAPHanaTopology_ABC_HDB96 rsc_SAPHanaTopology_ABC_HDB96 \
        meta clone-node-max=1 interleave=true
location cli-prefer-rsc_hsr_quiesce rsc_hsr_quiesce role=Started inf: senzhana3
location cli-prefer-rsc_ip_ABC_HDB96 rsc_ip_ABC_HDB96 role=Started inf: senzhana4
colocation col_saphana_ip_ABC_HDB96 2000: rsc_ip_ABC_HDB96:Started msl_SAPHana_ABC_HDB96:Master rsc_hsr_quiesce rsc_hsr_resume
order ord_SAPHana_ABC_HDB96 Optional: cln_SAPHanaTopology_ABC_HDB96 msl_SAPHana_ABC_HDB96
order ord_failover_ABC_HDB96 rsc_hsr_quiesce rsc_ip_ABC_HDB96 msl_SAPHana_ABC_HDB96:promote rsc_hsr_resume
property cib-bootstrap-options: \
        have-watchdog=true \
        dc-version="2.0.4+20200616.2deceaa3a-3.12.1-2.0.4+20200616.2deceaa3a" \
        cluster-infrastructure=corosync \
        cluster-name=hacluster \
        stonith-enabled=true \
        stonith-action=reboot \
        stonith-timeout=150s \
rsc_defaults rsc-options: \
        resource-stickiness=1000 \
op_defaults op-options: \
        timeout=600 \
        record-pending=true \

On Fri, 11 Feb 2022 at 21:29, Klaus Wenninger <kwenning at redhat.com> wrote:

On Fri, Feb 11, 2022 at 9:13 AM Strahil Nikolov via Users <users at clusterlabs.org> wrote:

Shouldn't you use kind ' Mandatory' and simetrical TRUE ?
If true, the reverse of the constraint applies for the opposite action (for example, if B starts after A starts, then B stops before A stops). 
If the script should be run before any change then it sounds as if an asymmetric order would be desirable.So you might create at least two order constraints explicitly listing the actions.But I doubt that this explains the unexpected behavior described.As Ulrich said a little bit more info about the config would be helpful.

Best Regards,Strahil Nikolov
  On Fri, Feb 11, 2022 at 9:11, Ulrich Windl<Ulrich.Windl at rz.uni-regensburg.de> wrote:   >>> Jonno <jstk888 at gmail.com> schrieb am 10.02.2022 um 20:43 in Nachricht
<CADGLmTe311U71NKSEBoeogkk+WcCHpt44MH1T-4hy-=J-NL8Tw at mail.gmail.com>:
> Hello,
> I am having some trouble getting my 2 node active/passive cluster to do
> what I want. More specifically, my cluster is removing the VIP from the
> cluster whenever I attempt a failover with a command such as "crm resource
> move rsc_cluster_vip node2".
> When running the command above, I am asking the cluster to migrate the VIP
> to the standby node, but I am expecting the cluster to honour the order
> constraint, by first running the script resource named "rsc_lsb_quiesce".
> The order constraint looks like:
> "order order_ABC rsc_lsb_quiesce rsc_cluster_vip msl_ABC:promote
> rsc_lsb_resume"
> But it doesn't seem to do what I expect. It always removes the VIP entirely
> from the cluster first, then it starts to follow the order constraint. This
> means my cluster is in a state where the VIP is completely gone for a
> couple of minutes. I've also tried doing a "crm resource move 
> rsc_lsb_quiesce
> node2" hoping to trigger the script resource first, but the cluster always
> removes the VIP before doing anything.
> My question is: How can I make the cluster follow this order constraint? I

I'm very sure you just made a configuration mistake.
But nobody can help you unless you show your configuration and example execution of events, plus the expected order of execution.


> need the cluster to run the "rsc_lsb_quiesce" script against a remote
> application server before any other action is taken. I especially need the
> VIP to stay where it is. Should I be doing this another way?
> Regards,
> Jonathan

