[ClusterLabs] DC leaving the cluster: some odd messages

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Feb 24 02:48:54 EST 2017


Hi!

I cleanly stopped OpenAIS on a sLES11 SP4 node. On another node I saw some strange messages:

Feb 24 08:38:58 h01 corosync[2822]:  [pcmk  ] info: update_member: Node h10 now has process list: 00000000000000000000000000000002 (2)
Feb 24 08:38:58 h01 corosync[2822]:  [pcmk  ] info: send_member_notification: Sending membership update 3360 to 5 children
Feb 24 08:38:58 h01 crmd[2832]:   notice: peer_update_callback: Our peer on the DC (h10) is dead
Feb 24 08:38:58 h01 stonith-ng[2828]:   notice: crm_update_peer_state: st_peer_update_callback: Node h10[739512330] - state is now lost (was member)
Feb 24 08:38:58 h01 cib[2827]:   notice: crm_update_peer_state: cib_peer_update_callback: Node h10[739512330] - state is now lost (was member)
Feb 24 08:38:58 h01 cib[2827]:   notice: crm_update_peer_state: plugin_handle_membership: Node h10[739512330] - state is now member (was lost)
Feb 24 08:38:58 h01 crmd[2832]:  warning: reap_dead_nodes: Our DC node (h10) left the cluster
Feb 24 08:38:58 h01 stonith-ng[2828]:   notice: crm_update_peer_state: plugin_handle_membership: Node h10[739512330] - state is now member (was lost)

So it looks like an down-up-down-up transition of node h10. Maybe this message contributes to the confusion:
Feb 24 08:38:58 h01 cib[2827]:  warning: cib_server_process_diff: Something went wrong in compatibility mode, requesting full refresh

Feb 24 08:38:58 h01 corosync[2822]:  [pcmk  ] info: ais_mark_unseen_peer_dead: Node h10 was not seen in the previous transition
Feb 24 08:38:58 h01 corosync[2822]:  [pcmk  ] info: update_member: Node 739512330/h10 is now: lost

Feb 24 08:38:58 h01 crmd[2832]:   notice: crm_update_peer_state: plugin_handle_membership: Node h10[739512330] - state is now lost (was member)
Feb 24 08:38:58 h01 crmd[2832]:  warning: match_down_event: No match for shutdown action on h10
Feb 24 08:38:58 h01 crmd[2832]:   notice: peer_update_callback: Stonith/shutdown of h10 not matched
Feb 24 08:38:58 h01 crmd[2832]:   notice: crm_update_quorum: Updating quorum status to true (call=162)

What worries me a bit is "No match for shutdown action on h10": Shouldn't it be obvious from the CIB that h10 was intended to leave?

Regards,
Ulrich








More information about the Users mailing list