[ClusterLabs] Understanding advisory resource ordering

Wed Jan 8 09:30:55 EST 2020

Hi,

some progress on this issue:

Am 20.12.19 um 13:37 schrieb Achim Leitner:
 > After pacemaker restart, we have Transition 0 with the DRBD actions,
 > followed 4s later with Transition 1 including all VM actions with
 > correct ordering. 32s later, there is Transition 2 with fewer actions,
 > but no new ones. Looking at the dot file generated by crm_simulate, the
 > order should still be defined correctly. Any idea what could cause this
 > effect? Which part of the logfiles should I look at? Unfortunately, I
 > was not able to create a simple setup to reproduce this.

With some more experimenting, I finally succeeded in creating a dummy
setup that shows this very same effect.

- Based on the examples in the previous mail
- Only one cluster node, only using default dummy resources
- Attached a report
- The tests were conducted on Ubuntu 16.04 in a KVM instance with 2 CPU
   cores, 4 GByte RAM

According to the advisory start order

order vm-startorder Optional: \
     ( VM-Step0 ) \
     ( VM-Step1 ) \
     ( VM-Step2a VM-Step2b ) \
     ( VM-Step3 ) \
     ( VM-Step4 ) \
     ( VM-Step5 ) \
     ( VM-Step6a VM-Step6b ) \
     ( VM-Step7 ) \
     ( VM-Step8a VM-Step8b VM-Step8c ) \
     ( VM-Step9a VM-Step9b VM-Step9c ) \
     symmetrical=false

the Steps 8a, 8b and 8c should start concurrently, but they don't:
Step 8c is first, then 8a, then 8b. All other parallel tasks do start
concurrently, even 9a, 9b and 9c:

# grep "notice: Operation VM-Step.*_start_0" ha-log.txt
Jan  8 13:52:19 hanode1 crmd[18240]:   notice: Operation 
VM-Step0_start_0: ok (node=hanode1, call=260, rc=0, cib-update=105, 
confirmed=true)
Jan  8 13:52:21 hanode1 crmd[18240]:   notice: Operation 
VM-Step1_start_0: ok (node=hanode1, call=263, rc=0, cib-update=108, 
confirmed=true)
Jan  8 13:52:23 hanode1 crmd[18240]:   notice: Operation 
VM-Step2a_start_0: ok (node=hanode1, call=265, rc=0, cib-update=109, 
confirmed=true)
Jan  8 13:52:23 hanode1 crmd[18240]:   notice: Operation 
VM-Step2b_start_0: ok (node=hanode1, call=266, rc=0, cib-update=110, 
confirmed=true)
Jan  8 13:52:25 hanode1 crmd[18240]:   notice: Operation 
VM-Step3_start_0: ok (node=hanode1, call=267, rc=0, cib-update=111, 
confirmed=true)
Jan  8 13:52:27 hanode1 crmd[18240]:   notice: Operation 
VM-Step4_start_0: ok (node=hanode1, call=268, rc=0, cib-update=112, 
confirmed=true)
Jan  8 13:52:29 hanode1 crmd[18240]:   notice: Operation 
VM-Step5_start_0: ok (node=hanode1, call=269, rc=0, cib-update=113, 
confirmed=true)
Jan  8 13:52:31 hanode1 crmd[18240]:   notice: Operation 
VM-Step6b_start_0: ok (node=hanode1, call=271, rc=0, cib-update=114, 
confirmed=true)
Jan  8 13:52:31 hanode1 crmd[18240]:   notice: Operation 
VM-Step6a_start_0: ok (node=hanode1, call=270, rc=0, cib-update=115, 
confirmed=true)
Jan  8 13:52:33 hanode1 crmd[18240]:   notice: Operation 
VM-Step7_start_0: ok (node=hanode1, call=272, rc=0, cib-update=116, 
confirmed=true)
Jan  8 13:52:38 hanode1 crmd[18240]:   notice: Operation 
VM-Step8c_start_0: ok (node=hanode1, call=275, rc=0, cib-update=117, 
confirmed=true)
Jan  8 13:52:43 hanode1 crmd[18240]:   notice: Operation 
VM-Step8a_start_0: ok (node=hanode1, call=273, rc=0, cib-update=118, 
confirmed=true)
Jan  8 13:52:48 hanode1 crmd[18240]:   notice: Operation 
VM-Step8b_start_0: ok (node=hanode1, call=274, rc=0, cib-update=119, 
confirmed=true)
Jan  8 13:52:53 hanode1 crmd[18240]:   notice: Operation 
VM-Step9a_start_0: ok (node=hanode1, call=276, rc=0, cib-update=120, 
confirmed=true)
Jan  8 13:52:53 hanode1 crmd[18240]:   notice: Operation 
VM-Step9b_start_0: ok (node=hanode1, call=277, rc=0, cib-update=121, 
confirmed=true)
Jan  8 13:52:53 hanode1 crmd[18240]:   notice: Operation 
VM-Step9c_start_0: ok (node=hanode1, call=278, rc=0, cib-update=122, 
confirmed=true)

Still, it seems that lrmd tried to start 8a, 8b and 8c at once:

# grep "executing - rsc:VM-Step" pacemaker.log
Jan 08 13:52:17 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step0 action:start call_id:260
Jan 08 13:52:19 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step1 action:start call_id:263
Jan 08 13:52:21 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step2a action:start call_id:265
Jan 08 13:52:21 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step2b action:start call_id:266
Jan 08 13:52:23 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step3 action:start call_id:267
Jan 08 13:52:25 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step4 action:start call_id:268
Jan 08 13:52:27 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step5 action:start call_id:269
Jan 08 13:52:29 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step6a action:start call_id:270
Jan 08 13:52:29 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step6b action:start call_id:271
Jan 08 13:52:31 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step7 action:start call_id:272
Jan 08 13:52:33 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step8a action:start call_id:273
Jan 08 13:52:33 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step8b action:start call_id:274
Jan 08 13:52:33 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step8c action:start call_id:275
Jan 08 13:52:48 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step9a action:start call_id:276
Jan 08 13:52:48 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step9b action:start call_id:277
Jan 08 13:52:48 [18237] hanode1       lrmd:     info: log_execute: 
executing - rsc:VM-Step9c action:start call_id:278

A second test run on Ubuntu 19.04 (pacemaker 1.1.18-2ubuntu1.19.04.1,
KVM guest with 4 CPU cores, 4 GByte RAM) shows the same behaviour.

I can't see what is causing this behaviour. Is it a bug? Configuration
issue? Is there any more detail I could provide? Any help here would
be highly appreciated.

Achim Leitner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: report.tar.bz2
Type: application/x-bzip
Size: 52225 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20200108/9f1b92ed/attachment-0001.bin>