[ClusterLabs] Understanding advisory resource ordering

Sat Jan 11 11:38:34 EST 2020

08.01.2020 17:30, Achim Leitner пишет:
> Hi,
> 
> some progress on this issue:
> 
> Am 20.12.19 um 13:37 schrieb Achim Leitner:
>> After pacemaker restart, we have Transition 0 with the DRBD actions,
>> followed 4s later with Transition 1 including all VM actions with
>> correct ordering. 32s later, there is Transition 2 with fewer actions,
>> but no new ones. Looking at the dot file generated by crm_simulate, the
>> order should still be defined correctly. Any idea what could cause this
>> effect? Which part of the logfiles should I look at? Unfortunately, I
>> was not able to create a simple setup to reproduce this.
> 
> With some more experimenting, I finally succeeded in creating a dummy
> setup that shows this very same effect.
> 
> - Based on the examples in the previous mail
> - Only one cluster node, only using default dummy resources
> - Attached a report
> - The tests were conducted on Ubuntu 16.04 in a KVM instance with 2 CPU
>   cores, 4 GByte RAM
> 
> According to the advisory start order
> 
> order vm-startorder Optional: \
>     ( VM-Step0 ) \
>     ( VM-Step1 ) \
>     ( VM-Step2a VM-Step2b ) \
>     ( VM-Step3 ) \
>     ( VM-Step4 ) \
>     ( VM-Step5 ) \
>     ( VM-Step6a VM-Step6b ) \
>     ( VM-Step7 ) \
>     ( VM-Step8a VM-Step8b VM-Step8c ) \
>     ( VM-Step9a VM-Step9b VM-Step9c ) \
>     symmetrical=false
> 
> the Steps 8a, 8b and 8c should start concurrently, but they don't:
> Step 8c is first, then 8a, then 8b. All other parallel tasks do start
> concurrently, even 9a, 9b and 9c:
> 
> # grep "notice: Operation VM-Step.*_start_0" ha-log.txt
> Jan  8 13:52:19 hanode1 crmd[18240]:   notice: Operation
> VM-Step0_start_0: ok (node=hanode1, call=260, rc=0, cib-update=105,
> confirmed=true)
> Jan  8 13:52:21 hanode1 crmd[18240]:   notice: Operation
> VM-Step1_start_0: ok (node=hanode1, call=263, rc=0, cib-update=108,
> confirmed=true)
> Jan  8 13:52:23 hanode1 crmd[18240]:   notice: Operation
> VM-Step2a_start_0: ok (node=hanode1, call=265, rc=0, cib-update=109,
> confirmed=true)
> Jan  8 13:52:23 hanode1 crmd[18240]:   notice: Operation
> VM-Step2b_start_0: ok (node=hanode1, call=266, rc=0, cib-update=110,
> confirmed=true)
> Jan  8 13:52:25 hanode1 crmd[18240]:   notice: Operation
> VM-Step3_start_0: ok (node=hanode1, call=267, rc=0, cib-update=111,
> confirmed=true)
> Jan  8 13:52:27 hanode1 crmd[18240]:   notice: Operation
> VM-Step4_start_0: ok (node=hanode1, call=268, rc=0, cib-update=112,
> confirmed=true)
> Jan  8 13:52:29 hanode1 crmd[18240]:   notice: Operation
> VM-Step5_start_0: ok (node=hanode1, call=269, rc=0, cib-update=113,
> confirmed=true)
> Jan  8 13:52:31 hanode1 crmd[18240]:   notice: Operation
> VM-Step6b_start_0: ok (node=hanode1, call=271, rc=0, cib-update=114,
> confirmed=true)
> Jan  8 13:52:31 hanode1 crmd[18240]:   notice: Operation
> VM-Step6a_start_0: ok (node=hanode1, call=270, rc=0, cib-update=115,
> confirmed=true)
> Jan  8 13:52:33 hanode1 crmd[18240]:   notice: Operation
> VM-Step7_start_0: ok (node=hanode1, call=272, rc=0, cib-update=116,
> confirmed=true)
> Jan  8 13:52:38 hanode1 crmd[18240]:   notice: Operation
> VM-Step8c_start_0: ok (node=hanode1, call=275, rc=0, cib-update=117,
> confirmed=true)
> Jan  8 13:52:43 hanode1 crmd[18240]:   notice: Operation
> VM-Step8a_start_0: ok (node=hanode1, call=273, rc=0, cib-update=118,
> confirmed=true)
> Jan  8 13:52:48 hanode1 crmd[18240]:   notice: Operation
> VM-Step8b_start_0: ok (node=hanode1, call=274, rc=0, cib-update=119,
> confirmed=true)
> Jan  8 13:52:53 hanode1 crmd[18240]:   notice: Operation
> VM-Step9a_start_0: ok (node=hanode1, call=276, rc=0, cib-update=120,
> confirmed=true)
> Jan  8 13:52:53 hanode1 crmd[18240]:   notice: Operation
> VM-Step9b_start_0: ok (node=hanode1, call=277, rc=0, cib-update=121,
> confirmed=true)
> Jan  8 13:52:53 hanode1 crmd[18240]:   notice: Operation
> VM-Step9c_start_0: ok (node=hanode1, call=278, rc=0, cib-update=122,
> confirmed=true)
> 

This log does not show when start was initiated, it shows when start was
(reported as) completed.

> Still, it seems that lrmd tried to start 8a, 8b and 8c at once:
> 
> # grep "executing - rsc:VM-Step" pacemaker.log
> Jan 08 13:52:17 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step0 action:start call_id:260
> Jan 08 13:52:19 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step1 action:start call_id:263
> Jan 08 13:52:21 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step2a action:start call_id:265
> Jan 08 13:52:21 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step2b action:start call_id:266
> Jan 08 13:52:23 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step3 action:start call_id:267
> Jan 08 13:52:25 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step4 action:start call_id:268
> Jan 08 13:52:27 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step5 action:start call_id:269
> Jan 08 13:52:29 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step6a action:start call_id:270
> Jan 08 13:52:29 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step6b action:start call_id:271
> Jan 08 13:52:31 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step7 action:start call_id:272
> Jan 08 13:52:33 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step8a action:start call_id:273
> Jan 08 13:52:33 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step8b action:start call_id:274
> Jan 08 13:52:33 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step8c action:start call_id:275
> Jan 08 13:52:48 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step9a action:start call_id:276
> Jan 08 13:52:48 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step9b action:start call_id:277
> Jan 08 13:52:48 [18237] hanode1       lrmd:     info: log_execute:
> executing - rsc:VM-Step9c action:start call_id:278
> 
> A second test run on Ubuntu 19.04 (pacemaker 1.1.18-2ubuntu1.19.04.1,
> KVM guest with 4 CPU cores, 4 GByte RAM) shows the same behaviour.
> 
> I can't see what is causing this behaviour.

primitive VM-Step8a Delay \
	params startdelay=10 stopdelay=2 mondelay=0 \
	op start interval=0 timeout=40s \
	op stop interval=0 timeout=30s \
	meta target-role=Started
primitive VM-Step8b Delay \
	params startdelay=15 stopdelay=2 mondelay=0 \
	op start interval=0 timeout=40s \
	op stop interval=0 timeout=30s \
	meta target-role=Started
primitive VM-Step8c Delay \
	params startdelay=5 stopdelay=2 mondelay=0 \
	op start interval=0 timeout=40s \
	op stop interval=0 timeout=30s \
	meta target-role=Started

You know what "startdelay" parameter does, do not you?

> Is it a bug? Configuration
> issue? Is there any more detail I could provide? Any help here would
> be highly appreciated.
> 
> Achim Leitner
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
>