[Pacemaker] [Question] About the stop order at the time of the Probe error.

Andrew Beekhof andrew at beekhof.net
Tue Sep 4 18:55:02 EDT 2012


On Wed, Aug 22, 2012 at 4:44 PM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi All,
>
> We found a problem at the time of Porobe error.
>
> It is the following simple resource constitution.
>
> ============
> Last updated: Wed Aug 22 15:19:50 2012
> Stack: Heartbeat
> Current DC: drbd1 (6081ac99-d941-40b9-a4a3-9f996ff291c0) - partition with quorum
> Version: 1.0.12-c6770b8
> 1 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
>
> Online: [ drbd1 ]
>
>  Resource Group: grpTest
>      resource1  (ocf::pacemaker:Dummy): Started drbd1
>      resource2  (ocf::pacemaker:Dummy): Started drbd1
>      resource3  (ocf::pacemaker:Dummy): Started drbd1
>      resource4  (ocf::pacemaker:Dummy): Started drbd1
>
> Node Attributes:
> * Node drbd1:
>
> Migration summary:
> * Node drbd1:
>
>
> Depending on the resource that the Probe error occurs, the stop of the resource does not become the inverse order.
>
> I confirmed it in the next procedure.
>
> Step 1) Make resource2 and resource4 a starting state.
>
> [root at drbd1 ~]# touch /var/run/Dummy-resource2.state
> [root at drbd1 ~]# touch /var/run/Dummy-resource4.state
>
> Step 2) Start a node and send cib.
>
> Step 3) Resource2 and resource3 stop, but are not inverse order.
>
> (snip)
> Aug 22 15:19:47 drbd1 pengine: [32722]: notice: group_print:  Resource Group: grpTest
> Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource1#011(ocf::pacemaker:Dummy):#011Stopped
> Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource2#011(ocf::pacemaker:Dummy):#011Started drbd1
> Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource3#011(ocf::pacemaker:Dummy):#011Stopped
> Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource4#011(ocf::pacemaker:Dummy):#011Started drbd1
> (snip)
> Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating action 6: stop resource2_stop_0 on drbd1 (local)
> Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing key=6:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource2_stop_0 )
> Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource2 stop[6] (pid 32745)
> Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating action 11: stop resource4_stop_0 on drbd1 (local)
> Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing key=11:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource4_stop_0 )
> Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource4 stop[7] (pid 32746)
> Aug 22 15:19:47 drbd1 lrmd: [32716]: info: operation stop[6] on resource2 for client 32719: pid 32745 exited with return code 0
> (snip)

Hmmm. Thats not good.

>
> I know that there is a cause of this stop order for order in group.
>
> In this case our user wants to stop a resource in inverse order definitely.
>
>  * resource4_stop -> resource2_stop
>
> Stop order is important to the resource of our user.
>
>
> I ask next question.
>
> Question 1) Is there right setting in cib.xml to evade this problem?

No.

>
> Question 2) In Pacemaker1.1, does this problem occur?

Yes.  I'll see what I can do.

>
> Question 3) I added following order.
>
>
>         <rsc_order id="order-2" first="resource1" then="resource3" />
>         <rsc_order id="order-3" first="resource1" then="resource4" />
>         <rsc_order id="order-5" first="resource2" then="resource4" />
>
>             And the addition of this order seems to solve a problem.
>             Is the addition of order right as one method of the solution, too?

Really the PE should handle this implicitly, without need for
additional constraints.




More information about the Pacemaker mailing list