[ClusterLabs] Cluster node getting stopped from other node(resending mail)

Arjun Pandey apandepublic at gmail.com
Wed Jul 1 00:30:40 EDT 2015


Hi

I am running a 2 node cluster with this config on centos 6.5/6.6

Master/Slave Set: foo-master [foo]
Masters: [ messi ]
Stopped: [ronaldo ]
 eth1-CP        (ocf::pw:IPaddr):       Started messi
 eth2-UP        (ocf::pw:IPaddr):       Started messi
 eth3-UPCP      (ocf::pw:IPaddr):       Started messi

where i have a multi-state resource foo being run in master/slave mode and
 IPaddr RA is just modified IPAddr2 RA. Additionally i have a
collocation constraint for the IP addr to be collocated with the master.

Sometimes when i setup the cluster , i find that one of the nodes (the
second node that joins ) gets stopped and i find this log.

2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker
Cluster Manager
2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]:   notice:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]:   notice:
do_state_transition: State transition S_PENDING -> S_NOT_DC [
input=I_NOT_DC cause=C_HA_MESSAG
E origin=do_cl_join_finalize_respond ]
2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]:   notice:
attrd_local_callback: Sending full refresh (origin=crmd)
2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]:   notice:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
************************ This looks to be the likely
reason*******************************************
2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:    error:
handle_request: We didn't ask to be shut down, yet our DC is telling us too
.
*********************************************************************************************************

2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]:   notice:
do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP
cause=C_HA_MESSAGE
 origin=route_message ]
2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]:   notice:
lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown...
waiting (2 ops remaining)

Based on the logs , pacemaker on active was stopping the secondary cloud
everytime it joins cluster. This issue seems similar to
http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error

Packages used :-
pacemaker-1.1.12-4.el6.x86_64
pacemaker-libs-1.1.12-4.el6.x86_64
pacemaker-cli-1.1.12-4.el6.x86_64
pacemaker-cluster-libs-1.1.12-4.el6.x86_64
pacemaker-debuginfo-1.1.10-14.el6.x86_64
pcsc-lite-libs-1.5.2-13.el6_4.x86_64
pcs-0.9.90-2.el6.centos.2.noarch
pcsc-lite-1.5.2-13.el6_4.x86_64
pcsc-lite-openct-0.6.19-4.el6.x86_64
corosync-1.4.1-17.el6.x86_64
corosynclib-1.4.1-17.el6.x86_64



Thanks in advance for your help

Regards
Arjun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20150701/a1cc3825/attachment-0002.html>


More information about the Users mailing list