<div dir="ltr">Hi Ken<div><br></div><div>If i look at the logs on the other node around the same time i see this. I can't figure out the reason based on these.Attaching the corosync.log for the other node as well.</div><div><br></div><div><div>Jun 01 13:55:44 [1965] messi crmd: info: do_dc_join_offer_one: An unknown node joined - (re-)offer to any unconfirmed nodes</div><div>Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: Making join offers based on membership 224</div><div>Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: Skipping messi: already known 4</div><div>Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: join-2: Sending offer to ronaldo</div><div>Jun 01 13:55:44 [1960] messi cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/138)</div><div>Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: Diff: --- 0.80.2 2</div><div>Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: Diff: +++ 0.80.3 (null)</div><div>Jun 01 13:55:44 [1965] messi crmd: info: crm_update_peer_join: join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1</div><div>Jun 01 13:55:44 [1965] messi crmd: info: abort_transition_graph: Transition aborted: Peer Halt (source=do_te_invoke:158, 1)</div><div>Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: + /cib: @num_updates=3</div><div>Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: + /cib/status/node_state[@id='ronaldo']: @crmd=online, @crm-debug-origin=peer_update_callback</div><div>Jun 01 13:55:44 [1960] messi cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=messi/crmd/138, version=0.80.3)</div><div>Jun 01 13:55:44 [1960] messi cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=ronaldo/crmd/3, version=0.80.3)</div><div>Jun 01 13:55:45 [1965] messi crmd: info: do_dc_join_offer_one: join-2: Processing join_announce request from ronaldo in state S_INTEGRATION</div><div>Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: do_dc_join_offer_one: Node ronaldo[2] - join-2 phase 1 -> 0</div><div>Jun 01 13:55:45 [1965] messi crmd: info: join_make_offer: join-2: Sending offer to ronaldo</div><div>Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1</div><div>Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: join_make_offer: Node messi[1] - join-2 phase 4 -> 0</div><div>Jun 01 13:55:45 [1965] messi crmd: info: join_make_offer: join-2: Sending offer to messi</div><div>Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: join_make_offer: Node messi[1] - join-2 phase 0 -> 1</div><div>Jun 01 13:55:45 [1965] messi crmd: info: abort_transition_graph: Transition aborted: Node join (source=do_dc_join_offer_one:236, 1)</div><div>Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node messi[1] - join-2 phase 1 -> 2</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node ronaldo[2] - join-2 phase 1 -> 2</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_expected: do_dc_join_filter_offer: Node ronaldo[2] - expected state is now member (was down)</div><div>Jun 01 13:55:46 [1965] messi crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crmd_join_phase_log: join-2: messi=integrated</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crmd_join_phase_log: join-2: ronaldo=integrated</div><div>Jun 01 13:55:46 [1965] messi crmd: info: do_dc_join_finalize: join-2: Syncing our CIB to the rest of the cluster</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: finalize_join_for: Node messi[1] - join-2 phase 2 -> 3</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: finalize_join_for: Node ronaldo[2] - join-2 phase 2 -> 3</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: do_dc_join_ack: Node messi[1] - join-2 phase 3 -> 4</div><div>Jun 01 13:55:46 [1965] messi crmd: info: do_dc_join_ack: join-2: Updating node state to member for messi</div><div>Jun 01 13:55:46 [1965] messi crmd: info: erase_status_tag: Deleting xpath: //node_state[@uname='messi']/lrm</div><div>Jun 01 13:55:46 [1960] messi cib: info: cib_process_replace: Digest matched on replace from messi: 5138b696984c7b834dd2b528dadabe0d</div><div>Jun 01 13:55:46 [1960] messi cib: info: cib_process_replace: Replaced 0.80.3 with 0.80.3 from messi</div><div>Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: do_dc_join_ack: Node ronaldo[2] - join-2 phase 3 -> 4</div></div><div><br></div><div><br></div><div>Regards</div><div>Arjun</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 1, 2015 at 7:17 PM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 06/30/2015 11:30 PM, Arjun Pandey wrote:<br>
> Hi<br>
><br>
> I am running a 2 node cluster with this config on centos 6.5/6.6<br>
><br>
> Master/Slave Set: foo-master [foo]<br>
> Masters: [ messi ]<br>
> Stopped: [ronaldo ]<br>
> eth1-CP (ocf::pw:IPaddr): Started messi<br>
> eth2-UP (ocf::pw:IPaddr): Started messi<br>
> eth3-UPCP (ocf::pw:IPaddr): Started messi<br>
><br>
> where i have a multi-state resource foo being run in master/slave mode and<br>
> IPaddr RA is just modified IPAddr2 RA. Additionally i have a<br>
> collocation constraint for the IP addr to be collocated with the master.<br>
><br>
> Sometimes when i setup the cluster , i find that one of the nodes (the<br>
> second node that joins ) gets stopped and i find this log.<br>
><br>
> 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker<br>
> Cluster Manager<br>
> 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]: notice:<br>
> attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)<br>
> 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]: notice:<br>
> do_state_transition: State transition S_PENDING -> S_NOT_DC [<br>
> input=I_NOT_DC cause=C_HA_MESSAG<br>
> E origin=do_cl_join_finalize_respond ]<br>
> 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]: notice:<br>
> attrd_local_callback: Sending full refresh (origin=crmd)<br>
> 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]: notice:<br>
> attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)<br>
> ************************ This looks to be the likely<br>
> reason*******************************************<br>
> 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]: error:<br>
> handle_request: We didn't ask to be shut down, yet our DC is telling us too<br>
> .<br>
> *********************************************************************************************************<br>
<br>
</div></div>Hi Arjun,<br>
<br>
I'd check the other node's logs at this time, to see why it requested<br>
the shutdown.<br>
<span class=""><br>
> 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]: notice:<br>
> do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP<br>
> cause=C_HA_MESSAGE<br>
> origin=route_message ]<br>
> 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]: notice:<br>
> lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown...<br>
> waiting (2 ops remaining)<br>
><br>
> Based on the logs , pacemaker on active was stopping the secondary cloud<br>
> everytime it joins cluster. This issue seems similar to<br>
> <a href="http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error" rel="noreferrer" target="_blank">http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error</a><br>
><br>
> Packages used :-<br>
> pacemaker-1.1.12-4.el6.x86_64<br>
> pacemaker-libs-1.1.12-4.el6.x86_64<br>
> pacemaker-cli-1.1.12-4.el6.x86_64<br>
> pacemaker-cluster-libs-1.1.12-4.el6.x86_64<br>
> pacemaker-debuginfo-1.1.10-14.el6.x86_64<br>
> pcsc-lite-libs-1.5.2-13.el6_4.x86_64<br>
> pcs-0.9.90-2.el6.centos.2.noarch<br>
> pcsc-lite-1.5.2-13.el6_4.x86_64<br>
> pcsc-lite-openct-0.6.19-4.el6.x86_64<br>
> corosync-1.4.1-17.el6.x86_64<br>
> corosynclib-1.4.1-17.el6.x86_64<br>
<br>
<br>
</span>_______________________________________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>
<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br></div>