[ClusterLabs] Cluster node getting stopped from other node(resending mail)

Arjun Pandey apandepublic at gmail.com
Tue Jul 7 08:58:59 UTC 2015


Hi Ken

If i look at the logs on the other node around the same time i see this. I
can't figure out the reason based on these.Attaching the corosync.log for
the other node as well.

Jun 01 13:55:44 [1965] messi       crmd:     info: do_dc_join_offer_one:
     An unknown node joined - (re-)offer to any unconfirmed nodes
Jun 01 13:55:44 [1965] messi       crmd:     info: join_make_offer:
Making join offers based on membership 224
Jun 01 13:55:44 [1965] messi       crmd:     info: join_make_offer:
Skipping messi: already known 4
Jun 01 13:55:44 [1965] messi       crmd:     info: join_make_offer:
join-2: Sending offer to ronaldo
Jun 01 13:55:44 [1960] messi        cib:     info: cib_process_request:
    Forwarding cib_modify operation for section status to master
(origin=local/crmd/138)
Jun 01 13:55:44 [1960] messi        cib:     info: cib_perform_op:
 Diff: --- 0.80.2 2
Jun 01 13:55:44 [1960] messi        cib:     info: cib_perform_op:
 Diff: +++ 0.80.3 (null)
Jun 01 13:55:44 [1965] messi       crmd:     info: crm_update_peer_join:
     join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1
Jun 01 13:55:44 [1965] messi       crmd:     info: abort_transition_graph:
     Transition aborted: Peer Halt (source=do_te_invoke:158, 1)
Jun 01 13:55:44 [1960] messi        cib:     info: cib_perform_op:      +
 /cib:  @num_updates=3
Jun 01 13:55:44 [1960] messi        cib:     info: cib_perform_op:      +
 /cib/status/node_state[@id='ronaldo']:  @crmd=online,
@crm-debug-origin=peer_update_callback
Jun 01 13:55:44 [1960] messi        cib:     info: cib_process_request:
    Completed cib_modify operation for section status: OK (rc=0,
origin=messi/crmd/138, version=0.80.3)
Jun 01 13:55:44 [1960] messi        cib:     info: cib_process_request:
    Completed cib_modify operation for section nodes: OK (rc=0,
origin=ronaldo/crmd/3, version=0.80.3)
Jun 01 13:55:45 [1965] messi       crmd:     info: do_dc_join_offer_one:
     join-2: Processing join_announce request from ronaldo in state
S_INTEGRATION
Jun 01 13:55:45 [1965] messi       crmd:     info: crm_update_peer_join:
     do_dc_join_offer_one: Node ronaldo[2] - join-2 phase 1 -> 0
Jun 01 13:55:45 [1965] messi       crmd:     info: join_make_offer:
join-2: Sending offer to ronaldo
Jun 01 13:55:45 [1965] messi       crmd:     info: crm_update_peer_join:
     join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1
Jun 01 13:55:45 [1965] messi       crmd:     info: crm_update_peer_join:
     join_make_offer: Node messi[1] - join-2 phase 4 -> 0
Jun 01 13:55:45 [1965] messi       crmd:     info: join_make_offer:
join-2: Sending offer to messi
Jun 01 13:55:45 [1965] messi       crmd:     info: crm_update_peer_join:
     join_make_offer: Node messi[1] - join-2 phase 0 -> 1
Jun 01 13:55:45 [1965] messi       crmd:     info: abort_transition_graph:
     Transition aborted: Node join (source=do_dc_join_offer_one:236, 1)
Jun 01 13:55:45 [1965] messi       crmd:     info: crm_update_peer_join:
     do_dc_join_filter_offer: Node messi[1] - join-2 phase 1 -> 2
Jun 01 13:55:46 [1965] messi       crmd:     info: crm_update_peer_join:
     do_dc_join_filter_offer: Node ronaldo[2] - join-2 phase 1 -> 2
Jun 01 13:55:46 [1965] messi       crmd:     info:
crm_update_peer_expected:    do_dc_join_filter_offer: Node ronaldo[2] -
expected state is now member (was down)
Jun 01 13:55:46 [1965] messi       crmd:     info: do_state_transition:
    State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Jun 01 13:55:46 [1965] messi       crmd:     info: crmd_join_phase_log:
    join-2: messi=integrated
Jun 01 13:55:46 [1965] messi       crmd:     info: crmd_join_phase_log:
    join-2: ronaldo=integrated
Jun 01 13:55:46 [1965] messi       crmd:     info: do_dc_join_finalize:
    join-2: Syncing our CIB to the rest of the cluster
Jun 01 13:55:46 [1965] messi       crmd:     info: crm_update_peer_join:
     finalize_join_for: Node messi[1] - join-2 phase 2 -> 3
Jun 01 13:55:46 [1965] messi       crmd:     info: crm_update_peer_join:
     finalize_join_for: Node ronaldo[2] - join-2 phase 2 -> 3
Jun 01 13:55:46 [1965] messi       crmd:     info: crm_update_peer_join:
     do_dc_join_ack: Node messi[1] - join-2 phase 3 -> 4
Jun 01 13:55:46 [1965] messi       crmd:     info: do_dc_join_ack:
 join-2: Updating node state to member for messi
Jun 01 13:55:46 [1965] messi       crmd:     info: erase_status_tag:
 Deleting xpath: //node_state[@uname='messi']/lrm
Jun 01 13:55:46 [1960] messi        cib:     info: cib_process_replace:
    Digest matched on replace from messi: 5138b696984c7b834dd2b528dadabe0d
Jun 01 13:55:46 [1960] messi        cib:     info: cib_process_replace:
    Replaced 0.80.3 with 0.80.3 from messi
Jun 01 13:55:46 [1965] messi       crmd:     info: crm_update_peer_join:
     do_dc_join_ack: Node ronaldo[2] - join-2 phase 3 -> 4


Regards
Arjun

On Wed, Jul 1, 2015 at 7:17 PM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 06/30/2015 11:30 PM, Arjun Pandey wrote:
> > Hi
> >
> > I am running a 2 node cluster with this config on centos 6.5/6.6
> >
> > Master/Slave Set: foo-master [foo]
> > Masters: [ messi ]
> > Stopped: [ronaldo ]
> >  eth1-CP        (ocf::pw:IPaddr):       Started messi
> >  eth2-UP        (ocf::pw:IPaddr):       Started messi
> >  eth3-UPCP      (ocf::pw:IPaddr):       Started messi
> >
> > where i have a multi-state resource foo being run in master/slave mode
> and
> >  IPaddr RA is just modified IPAddr2 RA. Additionally i have a
> > collocation constraint for the IP addr to be collocated with the master.
> >
> > Sometimes when i setup the cluster , i find that one of the nodes (the
> > second node that joins ) gets stopped and i find this log.
> >
> > 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker
> > Cluster Manager
> > 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]:   notice:
> > attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
> > 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]:   notice:
> > do_state_transition: State transition S_PENDING -> S_NOT_DC [
> > input=I_NOT_DC cause=C_HA_MESSAG
> > E origin=do_cl_join_finalize_respond ]
> > 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]:   notice:
> > attrd_local_callback: Sending full refresh (origin=crmd)
> > 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]:   notice:
> > attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
> > ************************ This looks to be the likely
> > reason*******************************************
> > 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:    error:
> > handle_request: We didn't ask to be shut down, yet our DC is telling us
> too
> > .
> >
> *********************************************************************************************************
>
> Hi Arjun,
>
> I'd check the other node's logs at this time, to see why it requested
> the shutdown.
>
> > 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]:   notice:
> > do_state_transition: State transition S_NOT_DC -> S_STOPPING [
> input=I_STOP
> > cause=C_HA_MESSAGE
> >  origin=route_message ]
> > 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]:   notice:
> > lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown...
> > waiting (2 ops remaining)
> >
> > Based on the logs , pacemaker on active was stopping the secondary cloud
> > everytime it joins cluster. This issue seems similar to
> >
> http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error
> >
> > Packages used :-
> > pacemaker-1.1.12-4.el6.x86_64
> > pacemaker-libs-1.1.12-4.el6.x86_64
> > pacemaker-cli-1.1.12-4.el6.x86_64
> > pacemaker-cluster-libs-1.1.12-4.el6.x86_64
> > pacemaker-debuginfo-1.1.10-14.el6.x86_64
> > pcsc-lite-libs-1.5.2-13.el6_4.x86_64
> > pcs-0.9.90-2.el6.centos.2.noarch
> > pcsc-lite-1.5.2-13.el6_4.x86_64
> > pcsc-lite-openct-0.6.19-4.el6.x86_64
> > corosync-1.4.1-17.el6.x86_64
> > corosynclib-1.4.1-17.el6.x86_64
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150707/4ecef5c8/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messi-corosync.log
Type: text/x-log
Size: 5329517 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150707/4ecef5c8/attachment-0003.bin>


More information about the Users mailing list