[ClusterLabs] both nodes OFFLINE

石井 俊直 i_j_e_x_a at yahoo.co.jp
Sat May 13 02:36:25 EDT 2017


Hi.

We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 and node-3
be the names of the nodes. When the problem happens, both nodes are recognized OFFLINE
on node-3 and on node-2, only node-3 is recognized OFFLINE.

When that happens, the following log message is added repeatedly on node-2 and log file
(/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. Log message
content on node-3 is different.

The erroneous state is temporally solved if OS of node-2 is restarted. On the other hand,
restarting OS of node-3 results in the same state.

I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) about
"Discarding update with feature set” problem. According to the message, our problem
may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2.

What I want to know is whether removing the above files on just one of the node is safe ?
If there’s other method to solve the problem, I’d like to hear that.

Thanks.

—— from corosync.log ———————————————————————————————— 
cib:    error: cib_perform_op:	Discarding update with feature set '3.0.11' greater than our own '3.0.10'
cib:    error: cib_process_request:	Completed cib_replace operation for section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, version=0.83.30)
crmd:   error: finalize_sync_callback:	Sync from node-3 failed: Protocol not supported
crmd:    info: register_fsa_error_adv:	Resetting the current action list
crmd: warning: do_log:	Input I_ELECTION_DC received in state S_FINALIZE_JOIN from finalize_sync_callback
crmd:    info: do_state_transition:	State transition S_FINALIZE_JOIN -> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback
crmd:    info: crm_update_peer_join:	initialize_join: Node node-2[1] - join-6329 phase 2 -> 0
crmd:    info: crm_update_peer_join:	initialize_join: Node node-3[2] - join-6329 phase 2 -> 0
crmd:    info: update_dc:	Unset DC. Was node-2
crmd:    info: join_make_offer:	join-6329: Sending offer to node-2
crmd:    info: crm_update_peer_join:	join_make_offer: Node node-2[1] - join-6329 phase 0 -> 1
crmd:    info: join_make_offer:	join-6329: Sending offer to node-3
crmd:    info: crm_update_peer_join:	join_make_offer: Node node-3[2] - join-6329 phase 0 -> 1
crmd:    info: do_dc_join_offer_all:	join-6329: Waiting on 2 outstanding join acks
crmd:    info: update_dc:	Set DC to node-2 (3.0.10)
crmd:    info: crm_update_peer_join:	do_dc_join_filter_offer: Node node-2[1] - join-6329 phase 1 -> 2
crmd:    info: crm_update_peer_join:	do_dc_join_filter_offer: Node node-3[2] - join-6329 phase 1 -> 2
crmd:    info: do_state_transition:	State transition S_INTEGRATION -> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state
crmd:    info: crmd_join_phase_log:	join-6329: node-2=integrated
crmd:    info: crmd_join_phase_log:	join-6329: node-3=integrated
crmd:  notice: do_dc_join_finalize:	Syncing the Cluster Information Base from node-3 to rest of cluster | join-6329
crmd:  notice: do_dc_join_finalize:	Requested version   <generation_tuple crm_feature_set="3.0.11" validate-with="pacemaker-2.5" epoch="84" num_updates="1" admin_epoch="0" cib-last-written="Thu May 11 08:05:45 2017" update-origin="node-2" update-client="crm_resource" update-user="root" have-quorum="1"/>
cib:     info: cib_process_request:	Forwarding cib_sync operation for section 'all' to node-3 (origin=local/crmd/12710)
cib:     info: cib_process_replace:	Digest matched on replace from node-3: 85a19c7927c54ccb15794f2720e07ce1
cib:     info: cib_process_replace:	Replaced 0.83.30 with 0.84.1 from node-3
cib:     info: __xml_diff_object:	Moved node_state at crmd (3 -> 2)



More information about the Users mailing list