[Pacemaker] Stopping heartbeat service on one node lead to restart of resources on other node in cluster

Tue Feb 7 05:35:42 EST 2012

Hello,
I have a 2 node cluster with following configuration:
**node $id="9e53a111-0dca-496c-9461-a38f3eec4d0e" mcg2 \
       attributes standby="off"
node $id="a90981f8-d993-4411-89f4-aff7156136d2" mcg1 \
       attributes standby="off"
primitive ClusterIP ocf:mcg:MCG_VIPaddr_RA \
       params ip="192.168.115.50" cidr_netmask="255.255.255.0"
nic="bond1.115:1" \
       op monitor interval="40" timeout="20" \
       meta target-role="Started"
primitive EMS ocf:heartbeat:jboss \
       params jboss_home="/opt/jboss-5.1.0.GA"
java_home="/opt/jdk1.6.0_29/" \
       op start interval="0" timeout="240" \
       op stop interval="0" timeout="240" \
       op monitor interval="30s" timeout="40s"
primitive NDB_MGMT ocf:mcg:NDB_MGM_RA \
       op monitor interval="120" timeout="120"
primitive NDB_VIP ocf:heartbeat:IPaddr2 \
       params ip="192.168.117.50" cidr_netmask="255.255.255.255"
nic="bond0.117:1" \*
      * op monitor interval="30" timeout="10"
primitive Rmgr ocf:mcg:RM_RA \
       op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
       op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
primitive Tmgr ocf:mcg:TM_RA \
       op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
       op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
primitive mysql ocf:mcg:MYSQLD_RA \
       op monitor interval="180" timeout="200"
primitive ndbd ocf:mcg:NDBD_RA \
       op monitor interval="120" timeout="120"
primitive pimd ocf:mcg:PIMD_RA \
       op monitor interval="60" role="Master" timeout="30"
on-fail="restart" \
       op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
ms ms_Rmgr Rmgr \
       meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" interleave="true" notify="true"
ms ms_Tmgr Tmgr \
       meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" interleave="true" notify="true"
ms ms_pimd pimd \
       meta master-max="1" master-max-node="1" clone-max="2"
clone-node-max="1" interleave="true" notify="true"
clone EMS_CLONE EMS \
       meta globally-unique="false" clone-max="2" clone-node-max="1"
target-role="Started"
clone mysqld_clone mysql \
       meta globally-unique="false" clone-max="2" clone-node-max="1"
clone ndbdclone ndbd \
       meta globally-unique="false" clone-max="2" clone-node-max="1"
target-role="Started"
colocation ip_with_Pimd inf: ClusterIP ms_pimd:Master
colocation ip_with_RM inf: ClusterIP ms_Rmgr:Master
colocation ip_with_TM inf: ClusterIP ms_Tmgr:Master
colocation ndb_vip-with-ndb_mgm inf: NDB_MGMT NDB_VIP
order RM-after-mysqld inf: mysqld_clone ms_Rmgr
order TM-after-RM inf: ms_Rmgr ms_Tmgr
order ip-after-pimd inf: ms_pimd ClusterIP
order mysqld-after-ndbd inf: ndbdclone mysqld_clone
order pimd-after-TM inf: ms_Tmgr ms_pimd
property $id="cib-bootstrap-options" \
       dc-version="1.0.11-55a5f5be61c367cbd676c2f0ec4f1c62b38223d7" \
       cluster-infrastructure="Heartbeat" \
       no-quorum-policy="ignore" \
       stonith-enabled="false"
rsc_defaults $id="rsc-options" \
       migration_threshold="3" \
       resource-stickiness="100"*

*With both nodes up and running, if heartbeat service is stopped on any of
the nodes, following resources are restarted on the other node:
mysqld_clone, ms_Rmgr, ms_Tmgr, ms_pimd, ClusterIP

>From the Heartbeat debug logs, it seems policy engine is initiating a
restart operation for the above resources but the reason for the same is
not clear.

Following are some excerpts from the logs:

"*Feb 07 11:06:31 MCG1 pengine: [20534]: info: determine_online_status:
Node mcg2 is shutting down
Feb 07 11:06:31 MCG1 pengine: [20534]: info: determine_online_status: Node
mcg1 is online
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Master/Slave
Set: ms_Rmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Rmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_ac**tive: Resource
Rmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active:
Resource**Rmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Rmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Masters: [
mcg1 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Slaves: [
mcg2 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Master/Slave
Set: ms_Tmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
Tmgr:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Masters: [
mcg1 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Slaves: [
mcg2 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Master/Slave
Set: ms_pimd
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
pimd:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
pimd:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
pimd:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
pimd:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Masters: [
mcg1 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Slaves: [
mcg2 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: native_print: ClusterIP
 (ocf::mcg:MCG_VIPaddr_RA):      Started mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Clone Set:
EMS_CLONE*
*Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
EMS:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource EMS:0
active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource EMS:1
active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource EMS:1
active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Started: [
mcg1 mcg2 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: native_print: NDB_VIP
 (ocf::heartbeat:IPaddr2):       Started mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: native_print: NDB_MGMT
(ocf::mcg:NDB_MGM_RA):  Started mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Clone Set:
mysqld_clone
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: nati**ve_active: Resource
mysql:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
mysql:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
mysql:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
mysql:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Started: [
mcg1 mcg2 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: clone_print:  Clone Set:
ndbdclone
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
ndbd:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
ndbd:0 active on mcg1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
ndbd:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_active: Resource
ndbd:1 active on mcg2
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: short_print:      Started: [
mcg1 mcg2 ]
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource Rmgr:1: preferring current location (node=mcg2, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource Tmgr:1: preferring current location (node=mcg2, weight=100)*
Fe*b 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource pimd:1: preferring current location (node=mcg2, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource EMS:1: preferring current location (node=mcg2, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource mysql:1: preferring current location (node=mcg2, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource ndbd:1: preferring current location (node=mcg2, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource Rmgr:0: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource Tmgr:0: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource pimd:0: preferring current location (node=mcg1, weight=100)**
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource ClusterIP: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource EMS:0: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource NDB_VIP: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource NDB_MGMT: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource mysql:0: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: common_apply_stickiness:
Resource ndbd:0: preferring current location (node=mcg1, weight=100)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to Rmgr:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes
for resource Rmgr:1 are unavailable, unclean or shutting down (mcg2: 0,
-1000000)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for Rmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource Rmgr:1
cannot run anywhere
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1
ms_Rmgr instances of a possible 2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Rmgr:0 master
score: 10
Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: Promoting Rmgr:0
(Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Rmgr:1 master
score: 0
Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: ms_Rmgr:
Promoted 1 instances of a possible 1 to master
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to Tmgr:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes
for resource Tmgr:1 are unavailable, unclean or shutting down (mcg2: 0,
-1000000)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for Tmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource Tmgr:1
cannot run anywhere*
*Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1
ms_Tmgr instances of a possible 2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Tmgr:0 master
score: 10
Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: Promoting Tmgr:0
(Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: Tmgr:1 master
score: 0
Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: ms_Tmgr:
Promoted 1 instances of a possible 1 to master
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to pimd:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes
for resource pimd:1 are unavailable, unclean or shutting down (mcg2: 0,
-1000000)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for pimd:1
Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource pimd:1
cannot run anywhere*
*Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1
ms_pimd instances of a possible 2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: pimd:0 master
score: 10
Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: Promoting pimd:0
(Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_color: pimd:1 master
score: 0
Feb 07 11:06:31 MCG1 pengine: [20534]: info: master_color: ms_pimd:
Promoted 1 instances of a possible 1 to master
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to ClusterIP
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to EMS:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes
for resource EMS:1 are unavailable, unclean or shutting down (mcg2: 0,
-1000000)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for EMS:1
Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource EMS:1
cannot run anywhere
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1
EMS_CLONE instances of a possible 2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to NDB_VIP
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to NDB_MGMT
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to mysql:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes
for resource mysql:1 are unavailable, unclean or shutting down (mcg2: 0,
-1000000)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for mysql:1
Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource mysql:1
cannot run anywhere
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1
mysqld_clone instances of a possible 2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Assigning
mcg1 to ndbd:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: All nodes
for resource ndbd:1 are unavailable, unclean or shutting down (mcg2: 0,
-1000000)
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for ndbd:1
Feb 07 11:06:31 MCG1 pengine: [20534]: info: native_color: Resource ndbd:1
cannot run anywhere
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_color: Allocated 1
ndbdclone instances of a possible 2
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_create_actions:
Creating actions for ms_Rmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_create_actions:
Creating actions for ms_Tmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: master_create_actions:
Creating actions for ms_pimd
Feb 07 11:06:31 MCG1 pengine: [20534]: info: stage6: Scheduling Node mcg2
for shutdown
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Rmgr:0 with Tmgr:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: find_compatible_child: Can't
pair Tmgr:1 with ms_Rmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: No match
found for Tmgr:1 (0)
Feb 07 11:06:31 MCG1 pengine: [20534]: info: clone_rsc_order_lh: Inhibiting
Tmgr:1 from being active
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for Tmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Tmgr:0 with Rmgr:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Tmgr:1 with Rmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Tmgr:0 with pimd:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: find_compatible_child: Can't
pair pimd:1 with ms_Tmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: No match
found for pimd:1 (0)
Feb 07 11:06:31 MCG1 pengine: [20534]: info: clone_rsc_order_lh: Inhibiting
pimd:1 from being active
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for pimd:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
pimd:0 with Tmgr:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
pimd:1 with Tmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Rmgr:0 with mysql:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Rmgr:1 with mysql:1
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
Rmgr:0      (Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop    resource
Rmgr:1      (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
Tmgr:0      (Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop    resource
Tmgr:1      (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
pimd:0      (Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop    resource
pimd:1      (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
ClusterIP   (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
EMS:0       (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop    resource
EMS:1       (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
NDB_VIP     (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
NDB_MGMT    (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
mysql:0     (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop    resource
mysql:1     (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
ndbd:0      (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stop    resource
ndbd:1      (mcg2)
"
*Thanks in advance.

Regards
Neha Chatrath
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120207/2f565c16/attachment-0001.html>