[ClusterLabs] Understanding advisory resource ordering
Achim Leitner
aleitner at linux-ag.com
Wed Jan 8 09:30:55 EST 2020
Hi,
some progress on this issue:
Am 20.12.19 um 13:37 schrieb Achim Leitner:
> After pacemaker restart, we have Transition 0 with the DRBD actions,
> followed 4s later with Transition 1 including all VM actions with
> correct ordering. 32s later, there is Transition 2 with fewer actions,
> but no new ones. Looking at the dot file generated by crm_simulate, the
> order should still be defined correctly. Any idea what could cause this
> effect? Which part of the logfiles should I look at? Unfortunately, I
> was not able to create a simple setup to reproduce this.
With some more experimenting, I finally succeeded in creating a dummy
setup that shows this very same effect.
- Based on the examples in the previous mail
- Only one cluster node, only using default dummy resources
- Attached a report
- The tests were conducted on Ubuntu 16.04 in a KVM instance with 2 CPU
cores, 4 GByte RAM
According to the advisory start order
order vm-startorder Optional: \
( VM-Step0 ) \
( VM-Step1 ) \
( VM-Step2a VM-Step2b ) \
( VM-Step3 ) \
( VM-Step4 ) \
( VM-Step5 ) \
( VM-Step6a VM-Step6b ) \
( VM-Step7 ) \
( VM-Step8a VM-Step8b VM-Step8c ) \
( VM-Step9a VM-Step9b VM-Step9c ) \
symmetrical=false
the Steps 8a, 8b and 8c should start concurrently, but they don't:
Step 8c is first, then 8a, then 8b. All other parallel tasks do start
concurrently, even 9a, 9b and 9c:
# grep "notice: Operation VM-Step.*_start_0" ha-log.txt
Jan 8 13:52:19 hanode1 crmd[18240]: notice: Operation
VM-Step0_start_0: ok (node=hanode1, call=260, rc=0, cib-update=105,
confirmed=true)
Jan 8 13:52:21 hanode1 crmd[18240]: notice: Operation
VM-Step1_start_0: ok (node=hanode1, call=263, rc=0, cib-update=108,
confirmed=true)
Jan 8 13:52:23 hanode1 crmd[18240]: notice: Operation
VM-Step2a_start_0: ok (node=hanode1, call=265, rc=0, cib-update=109,
confirmed=true)
Jan 8 13:52:23 hanode1 crmd[18240]: notice: Operation
VM-Step2b_start_0: ok (node=hanode1, call=266, rc=0, cib-update=110,
confirmed=true)
Jan 8 13:52:25 hanode1 crmd[18240]: notice: Operation
VM-Step3_start_0: ok (node=hanode1, call=267, rc=0, cib-update=111,
confirmed=true)
Jan 8 13:52:27 hanode1 crmd[18240]: notice: Operation
VM-Step4_start_0: ok (node=hanode1, call=268, rc=0, cib-update=112,
confirmed=true)
Jan 8 13:52:29 hanode1 crmd[18240]: notice: Operation
VM-Step5_start_0: ok (node=hanode1, call=269, rc=0, cib-update=113,
confirmed=true)
Jan 8 13:52:31 hanode1 crmd[18240]: notice: Operation
VM-Step6b_start_0: ok (node=hanode1, call=271, rc=0, cib-update=114,
confirmed=true)
Jan 8 13:52:31 hanode1 crmd[18240]: notice: Operation
VM-Step6a_start_0: ok (node=hanode1, call=270, rc=0, cib-update=115,
confirmed=true)
Jan 8 13:52:33 hanode1 crmd[18240]: notice: Operation
VM-Step7_start_0: ok (node=hanode1, call=272, rc=0, cib-update=116,
confirmed=true)
Jan 8 13:52:38 hanode1 crmd[18240]: notice: Operation
VM-Step8c_start_0: ok (node=hanode1, call=275, rc=0, cib-update=117,
confirmed=true)
Jan 8 13:52:43 hanode1 crmd[18240]: notice: Operation
VM-Step8a_start_0: ok (node=hanode1, call=273, rc=0, cib-update=118,
confirmed=true)
Jan 8 13:52:48 hanode1 crmd[18240]: notice: Operation
VM-Step8b_start_0: ok (node=hanode1, call=274, rc=0, cib-update=119,
confirmed=true)
Jan 8 13:52:53 hanode1 crmd[18240]: notice: Operation
VM-Step9a_start_0: ok (node=hanode1, call=276, rc=0, cib-update=120,
confirmed=true)
Jan 8 13:52:53 hanode1 crmd[18240]: notice: Operation
VM-Step9b_start_0: ok (node=hanode1, call=277, rc=0, cib-update=121,
confirmed=true)
Jan 8 13:52:53 hanode1 crmd[18240]: notice: Operation
VM-Step9c_start_0: ok (node=hanode1, call=278, rc=0, cib-update=122,
confirmed=true)
Still, it seems that lrmd tried to start 8a, 8b and 8c at once:
# grep "executing - rsc:VM-Step" pacemaker.log
Jan 08 13:52:17 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step0 action:start call_id:260
Jan 08 13:52:19 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step1 action:start call_id:263
Jan 08 13:52:21 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step2a action:start call_id:265
Jan 08 13:52:21 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step2b action:start call_id:266
Jan 08 13:52:23 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step3 action:start call_id:267
Jan 08 13:52:25 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step4 action:start call_id:268
Jan 08 13:52:27 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step5 action:start call_id:269
Jan 08 13:52:29 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step6a action:start call_id:270
Jan 08 13:52:29 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step6b action:start call_id:271
Jan 08 13:52:31 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step7 action:start call_id:272
Jan 08 13:52:33 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step8a action:start call_id:273
Jan 08 13:52:33 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step8b action:start call_id:274
Jan 08 13:52:33 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step8c action:start call_id:275
Jan 08 13:52:48 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step9a action:start call_id:276
Jan 08 13:52:48 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step9b action:start call_id:277
Jan 08 13:52:48 [18237] hanode1 lrmd: info: log_execute:
executing - rsc:VM-Step9c action:start call_id:278
A second test run on Ubuntu 19.04 (pacemaker 1.1.18-2ubuntu1.19.04.1,
KVM guest with 4 CPU cores, 4 GByte RAM) shows the same behaviour.
I can't see what is causing this behaviour. Is it a bug? Configuration
issue? Is there any more detail I could provide? Any help here would
be highly appreciated.
Achim Leitner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: report.tar.bz2
Type: application/x-bzip
Size: 52225 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20200108/9f1b92ed/attachment-0001.bin>
More information about the Users
mailing list