[Pacemaker] [Problem] About the replacement of the master/slave resource.

Mon Sep 10 06:42:50 UTC 2012

Hi All,

We confirmed movement of the trouble of the clone resource that we combined with Master/Slave resource.

The master / slave resources are replaced under the influence of the trouble of the clonal resource.

We confirmed it in the next procedure.

Step1) We start a cluster and send cib.

============
Last updated: Mon Sep 10 15:26:25 2012
Stack: Heartbeat
Current DC: drbd2 (08607c71-da7b-4abf-b6d5-39ee39552e89) - partition with quorum
Version: 1.0.12-c6770b8
2 Nodes configured, unknown expected votes
6 Resources configured.
============

Online: [ drbd1 drbd2 ]

 Resource Group: grpPostgreSQLDB
     prmApPostgreSQLDB  (ocf::pacemaker:Dummy): Started drbd1
 Resource Group: grpStonith1
     prmStonith1-2      (stonith:external/ssh): Started drbd2
     prmStonith1-3      (stonith:meatware):     Started drbd2
 Resource Group: grpStonith2
     prmStonith2-2      (stonith:external/ssh): Started drbd1
     prmStonith2-3      (stonith:meatware):     Started drbd1
 Master/Slave Set: msDrPostgreSQLDB
     Masters: [ drbd1 ]
     Slaves: [ drbd2 ]
 Clone Set: clnDiskd1
     Started: [ drbd1 drbd2 ]
 Clone Set: clnPingd
     Started: [ drbd1 drbd2 ]

Step2) We cause a monitor error in pingd.

[root at drbd1 ~]# rm -rf /var/run/pingd-default_ping_set 

Step3) FailOver is finished.

============
Last updated: Mon Sep 10 15:27:08 2012
Stack: Heartbeat
Current DC: drbd2 (08607c71-da7b-4abf-b6d5-39ee39552e89) - partition with quorum
Version: 1.0.12-c6770b8
2 Nodes configured, unknown expected votes
6 Resources configured.
============

Online: [ drbd1 drbd2 ]

 Resource Group: grpPostgreSQLDB
     prmApPostgreSQLDB  (ocf::pacemaker:Dummy): Started drbd2
 Resource Group: grpStonith1
     prmStonith1-2      (stonith:external/ssh): Started drbd2
     prmStonith1-3      (stonith:meatware):     Started drbd2
 Resource Group: grpStonith2
     prmStonith2-2      (stonith:external/ssh): Started drbd1
     prmStonith2-3      (stonith:meatware):     Started drbd1
 Master/Slave Set: msDrPostgreSQLDB
     Masters: [ drbd2 ]
     Stopped: [ prmDrPostgreSQLDB:1 ]
 Clone Set: clnDiskd1
     Started: [ drbd1 drbd2 ]
 Clone Set: clnPingd
     Started: [ drbd2 ]
     Stopped: [ prmPingd:0 ]

Failed actions:
    prmPingd:0_monitor_10000 (node=drbd1, call=14, rc=7, status=complete): not running

However, Master/Slave resources seemed to be replaced when we watched log.

Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Move    resource prmApPostgreSQLDB#011(Started drbd1 -> drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith1-2#011(Started drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith1-3#011(Started drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith2-2#011(Started drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmStonith2-3#011(Started drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Move    resource prmDrPostgreSQLDB:0#011(Master drbd1 -> drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Stop    resource prmDrPostgreSQLDB:1#011(drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmDiskd1:0#011(Started drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmDiskd1:1#011(Started drbd2)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Stop    resource prmPingd:0#011(drbd1)
Sep 10 15:26:53 drbd2 pengine: [2668]: notice: LogActions: Leave   resource prmPingd:1#011(Started drbd2)

The replacement is unnecessary, and Slave becomes Master, and inoperative Master should have only to originally stop.

However, this problem seems to be solved in Pacemaker1.1.

Will the correction be possible for Pacemaker1.0?
Because I have a big difference in placement processing with Pacemaker1.1, I think that the correction to Pacemaker1.0 is difficult.

 * This problem may have been reported as a known problem.
 * I registered this problem with Bugzilla.
  * http://bugs.clusterlabs.org/show_bug.cgi?id=5103

Best Regards,
Hideo Yamauchi.