[Pacemaker] Can't failover Master/Slave with group(primitive x3) setting

Tue Sep 27 00:31:24 EDT 2011

Hi,

> Which version did you check?

Pacemaker 1.0.11.

> The latest from git seems to work fine:
>
> Current cluster status:
> Online: [ bl460g1n13 bl460g1n14 ]
>
>  Resource Group: grpDRBD
>     dummy01    (ocf::pacemaker:Dummy): Started bl460g1n13 FAILED
>     dummy02    (ocf::pacemaker:Dummy): Started bl460g1n13
>     dummy03    (ocf::pacemaker:Dummy): Started bl460g1n13
>  Master/Slave Set: msDRBD [prmDRBD]
>     Masters: [ bl460g1n13 ]
>     Slaves: [ bl460g1n14 ]
>
> Transition Summary:
> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Recover
> dummy01 (Started bl460g1n13)
> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
> dummy02 (Started bl460g1n13)
> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Restart
> dummy03 (Started bl460g1n13)
> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
> prmDRBD:0       (Master bl460g1n13)
> crm_simulate[13781]: 2011/09/26_15:00:05 notice: LogActions: Leave
> prmDRBD:1       (Slave bl460g1n14)
>
> Executing cluster transition:
>  * Executing action 14: dummy03_stop_0 on bl460g1n13
>  * Executing action 12: dummy02_stop_0 on bl460g1n13
>  * Executing action 2: dummy01_stop_0 on bl460g1n13
>  * Executing action 11: dummy01_start_0 on bl460g1n13
>  * Executing action 1: dummy01_monitor_10000 on bl460g1n13
>  * Executing action 13: dummy02_start_0 on bl460g1n13
>  * Executing action 3: dummy02_monitor_10000 on bl460g1n13
>  * Executing action 15: dummy03_start_0 on bl460g1n13
>  * Executing action 4: dummy03_monitor_10000 on bl460g1n13

dummy01 got the fail-count,
so dummy01 should move from bl460g1n13 to bl460g1n14.
Why does it re-start on the failure node?

I got the latest changeset from hg;

# hg log | head -n 7
changeset:   15777:a15ead49e20f
branch:      stable-1.0
tag:         tip
user:        Andrew Beekhof <andrew at beekhof.net>
date:        Thu Aug 25 16:49:59 2011 +1000
summary:     changeset: 15775:fe18a1ad46f8

# crm
crm(live)# cib import pe-input-7.bz2
crm(pe-input-7)# configure ptest vvv
ptest[19194]: 2011/09/27_11:53:45 notice: unpack_config: On loss of
CCM Quorum: Ignore
ptest[19194]: 2011/09/27_11:53:45 WARN: unpack_nodes: Blind faith: not
fencing unseen nodes
ptest[19194]: 2011/09/27_11:53:45 notice: group_print:  Resource Group: grpDRBD
ptest[19194]: 2011/09/27_11:53:45 notice: native_print:      dummy01
 (ocf::pacemaker:Dummy): Started bl460g1n13
ptest[19194]: 2011/09/27_11:53:45 notice: native_print:      dummy02
 (ocf::pacemaker:Dummy): Started bl460g1n13
ptest[19194]: 2011/09/27_11:53:45 notice: native_print:      dummy03
 (ocf::pacemaker:Dummy): Started bl460g1n13
ptest[19194]: 2011/09/27_11:53:45 notice: clone_print:  Master/Slave Set: msDRBD
ptest[19194]: 2011/09/27_11:53:45 notice: short_print:      Masters: [
bl460g1n13 ]
ptest[19194]: 2011/09/27_11:53:45 notice: short_print:      Slaves: [
bl460g1n14 ]
ptest[19194]: 2011/09/27_11:53:45 WARN: common_apply_stickiness:
Forcing dummy01 away from bl460g1n13 after 1 failures (max=1)
ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop    resource
dummy01  (bl460g1n13)
ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop    resource
dummy02  (bl460g1n13)
ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Stop    resource
dummy03  (bl460g1n13)
ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave   resource
prmDRBD:0        (Master bl460g1n13)
ptest[19194]: 2011/09/27_11:53:45 notice: LogActions: Leave   resource
prmDRBD:1        (Slave bl460g1n14)
INFO: install graphviz to see a transition graph
crm(pe-input-7)# quit

reverts to Pacemaker 1.0.11,

# hg revert -a -r b2e39d318fda
# make install

# crm
crm(live)# cib import pe-input-7.bz2
crm(pe-input-7)# configure ptest vvv
ptest[751]: 2011/09/27_11:57:50 notice: unpack_config: On loss of CCM
Quorum: Ignore
ptest[751]: 2011/09/27_11:57:50 WARN: unpack_nodes: Blind faith: not
fencing unseen nodes
ptest[751]: 2011/09/27_11:57:50 notice: group_print:  Resource Group: grpDRBD
ptest[751]: 2011/09/27_11:57:50 notice: native_print:      dummy01
 (ocf::pacemaker:Dummy): Started bl460g1n13
ptest[751]: 2011/09/27_11:57:50 notice: native_print:      dummy02
 (ocf::pacemaker:Dummy): Started bl460g1n13
ptest[751]: 2011/09/27_11:57:50 notice: native_print:      dummy03
 (ocf::pacemaker:Dummy): Started bl460g1n13
ptest[751]: 2011/09/27_11:57:50 notice: clone_print:  Master/Slave Set: msDRBD
ptest[751]: 2011/09/27_11:57:50 notice: short_print:      Masters: [
bl460g1n13 ]
ptest[751]: 2011/09/27_11:57:50 notice: short_print:      Slaves: [ bl460g1n14 ]
ptest[751]: 2011/09/27_11:57:50 WARN: common_apply_stickiness: Forcing
dummy01 away from bl460g1n13 after 1 failures (max=1)
ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
monitor (10s) for dummy01 on bl460g1n14
ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
monitor (10s) for dummy02 on bl460g1n14
ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
monitor (10s) for dummy03 on bl460g1n14
ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
monitor (20s) for prmDRBD:0 on bl460g1n13
ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
monitor (10s) for prmDRBD:1 on bl460g1n14
ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
monitor (20s) for prmDRBD:0 on bl460g1n13
ptest[751]: 2011/09/27_11:57:50 notice: RecurringOp:  Start recurring
monitor (10s) for prmDRBD:1 on bl460g1n14
ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
dummy01       (Started bl460g1n13 -> bl460g1n14)
ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
dummy02       (Started bl460g1n13 -> bl460g1n14)
ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Move resource
dummy03       (Started bl460g1n13 -> bl460g1n14)
ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Demote prmDRBD:0
 (Master -> Slave bl460g1n13)
ptest[751]: 2011/09/27_11:57:50 notice: LogActions: Promote prmDRBD:1
 (Slave -> Master bl460g1n14)
INFO: install graphviz to see a transition graph

Pacemaker 1.0.10 moved the failure resource to the other node.
It's the expected behavior.

I attached the hb_report which includes the above pe-input-7.bz2.

Thanks,
Junko
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report.tar.bz2
Type: application/x-bzip2
Size: 42704 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110927/058a9718/attachment-0003.bz2>