[Pacemaker] [Question] About the stop order at the time of the Probe error.

Wed Aug 22 02:44:43 EDT 2012

Hi All,

We found a problem at the time of Porobe error.

It is the following simple resource constitution.

============
Last updated: Wed Aug 22 15:19:50 2012
Stack: Heartbeat
Current DC: drbd1 (6081ac99-d941-40b9-a4a3-9f996ff291c0) - partition with quorum
Version: 1.0.12-c6770b8
1 Nodes configured, unknown expected votes
1 Resources configured.
============

Online: [ drbd1 ]

 Resource Group: grpTest
     resource1  (ocf::pacemaker:Dummy): Started drbd1
     resource2  (ocf::pacemaker:Dummy): Started drbd1
     resource3  (ocf::pacemaker:Dummy): Started drbd1
     resource4  (ocf::pacemaker:Dummy): Started drbd1

Node Attributes:
* Node drbd1:

Migration summary:
* Node drbd1: 

Depending on the resource that the Probe error occurs, the stop of the resource does not become the inverse order.

I confirmed it in the next procedure.

Step 1) Make resource2 and resource4 a starting state.

[root at drbd1 ~]# touch /var/run/Dummy-resource2.state
[root at drbd1 ~]# touch /var/run/Dummy-resource4.state

Step 2) Start a node and send cib.

Step 3) Resource2 and resource3 stop, but are not inverse order.

(snip)
Aug 22 15:19:47 drbd1 pengine: [32722]: notice: group_print:  Resource Group: grpTest
Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource1#011(ocf::pacemaker:Dummy):#011Stopped 
Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource2#011(ocf::pacemaker:Dummy):#011Started drbd1
Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource3#011(ocf::pacemaker:Dummy):#011Stopped 
Aug 22 15:19:47 drbd1 pengine: [32722]: notice: native_print:      resource4#011(ocf::pacemaker:Dummy):#011Started drbd1
(snip)
Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating action 6: stop resource2_stop_0 on drbd1 (local)
Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing key=6:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource2_stop_0 )
Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource2 stop[6] (pid 32745)
Aug 22 15:19:47 drbd1 crmd: [32719]: info: te_rsc_command: Initiating action 11: stop resource4_stop_0 on drbd1 (local)
Aug 22 15:19:47 drbd1 crmd: [32719]: info: do_lrm_rsc_op: Performing key=11:2:0:5c924067-0d20-48fd-9772-88e530661270 op=resource4_stop_0 )
Aug 22 15:19:47 drbd1 lrmd: [32716]: info: rsc:resource4 stop[7] (pid 32746)
Aug 22 15:19:47 drbd1 lrmd: [32716]: info: operation stop[6] on resource2 for client 32719: pid 32745 exited with return code 0
(snip)

I know that there is a cause of this stop order for order in group.

In this case our user wants to stop a resource in inverse order definitely.

 * resource4_stop -> resource2_stop

Stop order is important to the resource of our user.

I ask next question.

Question 1) Is there right setting in cib.xml to evade this problem?

Question 2) In Pacemaker1.1, does this problem occur?

Question 3) I added following order.

        <rsc_order id="order-2" first="resource1" then="resource3" />
        <rsc_order id="order-3" first="resource1" then="resource4" />
        <rsc_order id="order-5" first="resource2" then="resource4" />

            And the addition of this order seems to solve a problem.
            Is the addition of order right as one method of the solution, too?

Best Regards,
Hideo Yamauchi.