[ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

Mon Apr 11 14:06:00 EDT 2022

I was able to fix this by using meta interleave=true on the clones(step 2 below)

New steps:

  1.  Resource creates(same as before):

$ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone

$ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone

$ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone

  1.  Add meta interleave=true on the clones explicitly via update. Adding meta interleave=true to the create above DOES NOT work:

$ sudo pcs resource update test-2-clone meta interleave=true

$ sudo pcs resource update test-3-clone meta interleave=true

  1.  Then order them(same as before):

$ sudo pcs constraint order test-1-clone then test-2-clone

Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start then-action=start)

$ sudo pcs constraint order test-2-clone then test-3-clone

Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start then-action=start)

  1.  Then when I restart test-1-clone(same as before), only the resources on the affected nodes, restart:

$ sudo pcs resource restart test-1 node1_a

Warning: using test-1-clone... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name)

test-1-clone successfully restarted

  1.  Result of step 4 above:
Apr 11 17:58:39 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Stop       test-1:0                                   (                 node1_a )   due to node availability
Apr 11 17:58:39 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Stop       test-2:0                                   (                 node1_a )
Apr 11 17:58:39 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Stop       test-3:0                                   (                 node1_a )   due to unrunnable test-2:0 start
Apr 11 17:58:39 NODE1-B pacemaker-controld[103051]:  notice: Initiating stop operation test-3_stop_0 on node1_a
Apr 11 17:58:39 NODE1-B pacemaker-controld[103051]:  notice: Initiating stop operation test-2_stop_0 on node1_a
Apr 11 17:58:39 NODE1-B pacemaker-controld[103051]:  notice: Initiating stop operation test-1_stop_0 on node1_a
Apr 11 17:58:41 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Start      test-1:3                                   (                 node1_a )
Apr 11 17:58:41 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Start      test-2:3                                   (                 node1_a )
Apr 11 17:58:41 NODE1-B pacemaker-schedulerd[103050]:  notice:  * Start      test-3:3                                   (                 node1_a )

What I would like to call out is the documentation here does not explicitly state this behavior - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/ch-advancedresource-haar

interleave
Changes the behavior of ordering constraints (between clones/masters) so that copies of the first clone can start or stop as soon as the copy on the same node of the second clone has started or stopped (rather than waiting until every instance of the second clone has started or stopped). Allowed values: false, true. The default value is false.

"stopped (rather than waiting until every instance of the second clone has started or stopped)" - this may suggest this implicitly but definitely not clear.

Please let me know if I am missing something and if there is a better recommendation.

Thanks,
Raghav

Internal Use - Confidential
From: ChittaNagaraj, Raghav
Sent: Monday, April 11, 2022 10:59 AM
To: Strahil Nikolov; Cluster Labs - All topics related to open-source clustering welcomed
Cc: Haase, David; Hicks, Richard; gandhi, rajesh; Burney, Scott; Farnsworth, Devin
Subject: RE: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

Hello Strahil,

Thank you for your response.

The actual problem I wanted to discuss here is restart of ordered resources on unaffected nodes.

>From the observation in my original email:

  1.  I have 4 pacemaker nodes -

node2_a

node2_b

node1_a

node1_b

  1.  I restarted test-1 on node1_a

  1.  This restarted test-2 and test-3 clones on node1_a. This is fine as node1_a is the affected node
  2.  But, it also restarted test-2 and test-3  on the unaffected nodes. Below indicating test-2 restart on unaffected nodes node1_b, node2_b and node2_a which I don't want:

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    test-2:0                                   (                               node1_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    test-2:2                                   (                               node2_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    test-2:3                                   (                               node2_a )   due to required test-1-clone running

Please let me know if you have any further questions.

Thanks,
Raghav

From: Strahil Nikolov <hunter86_bg at yahoo.com<mailto:hunter86_bg at yahoo.com>>
Sent: Friday, April 8, 2022 12:00 PM
To: Cluster Labs - All topics related to open-source clustering welcomed; ChittaNagaraj, Raghav
Cc: Haase, David; Hicks, Richard; gandhi, rajesh; Burney, Scott; Farnsworth, Devin; ChittaNagaraj, Raghav
Subject: Re: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

[EXTERNAL EMAIL]
You can use 'kind' and 'symmetrical' to control order constraints. The default value for symmetrical is 'true' which means that in order to stop dummy1 , the cluster has to stop dummy1 & dummy2.

Best Regards,
Strahil Nikolov
On Fri, Apr 8, 2022 at 15:29, ChittaNagaraj, Raghav
<Raghav.ChittaNagaraj at dell.com<mailto:Raghav.ChittaNagaraj at dell.com>> wrote:

Hello Team,

Hope you are doing well.

I have a 4 node pacemaker cluster where I created clone dummy resources test-1, test-2 and test-3 below:

$ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone

$ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone

$ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone

Then I ordered them so test-2-clone starts after test-1-clone and test-3-clone starts after test-2-clone:

$ sudo pcs constraint order test-1-clone then test-2-clone

Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start then-action=start)

$ sudo pcs constraint order test-2-clone then test-3-clone

Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start then-action=start)

Here are my clone sets(snippet of "pcs status" output pasted below):

  * Clone Set: test-1-clone [test-1]:

    * Started: [ node2_a node2_b node1_a node1_b ]

  * Clone Set: test-2-clone [test-2]:

    * Started: [ node2_a node2_b node1_a node1_b ]

  * Clone Set: test-3-clone [test-3]:

    * Started: [ node2_a node2_b node1_a node1_b ]

Then I restart test-1 on just node1_a:

$ sudo pcs resource restart test-1 node1_a

Warning: using test-1-clone... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name)

test-1-clone successfully restarted

This causes test-2 and test-3 clones to restart on all pacemaker nodes when my intention is for them to restart on just node1_a.

Below is the log tracing seen on the Designated Controller NODE1-B:

Apr 07 20:25:01 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Stop       test-1:1                                   (                               node1_a )   due to node availability

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    test-2:0                                   (                               node1_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    test-2:1                                   (                               node1_a )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    test-2:2                                   (                               node2_b )   due to required test-1-clone running

Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    test-2:3                                   (                               node2_a )   due to required test-1-clone running

Above is a representation of the observed behavior using dummy resources.

Is this the expected behavior of cloned resources?

My goal is to be able to restart test-2-clone and test-3-clone on just the node that experienced test-1 restart rather than all other nodes in the cluster.

Please let us know if any additional information will help for you to be able to provide feedback.

Thanks for your help!

- Raghav

Internal Use - Confidential
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users [lists.clusterlabs.org]<https://urldefense.com/v3/__https:/lists.clusterlabs.org/mailman/listinfo/users__;!!LpKI!2bhPbGcGES9BufPlX3bTNMxt-PJQ3Jt8xF18oplZizQBsxfwlSe8KMEjLAhHOeP7aEE73qs$>

ClusterLabs home: https://www.clusterlabs.org/ [clusterlabs.org]<https://urldefense.com/v3/__https:/www.clusterlabs.org/__;!!LpKI!2bhPbGcGES9BufPlX3bTNMxt-PJQ3Jt8xF18oplZizQBsxfwlSe8KMEjLAhHOeP7uDjzH5E$>

Internal Use - Confidential
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220411/3812436d/attachment-0001.htm>