[ClusterLabs] both nodes OFFLINE
Ken Gaillot
kgaillot at redhat.com
Mon May 22 18:12:21 EDT 2017
On 05/13/2017 01:36 AM, 石井 俊直 wrote:
> Hi.
>
> We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 and node-3
> be the names of the nodes. When the problem happens, both nodes are recognized OFFLINE
> on node-3 and on node-2, only node-3 is recognized OFFLINE.
>
> When that happens, the following log message is added repeatedly on node-2 and log file
> (/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. Log message
> content on node-3 is different.
>
> The erroneous state is temporally solved if OS of node-2 is restarted. On the other hand,
> restarting OS of node-3 results in the same state.
>
> I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) about
> "Discarding update with feature set” problem. According to the message, our problem
> may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2.
>
> What I want to know is whether removing the above files on just one of the node is safe ?
> If there’s other method to solve the problem, I’d like to hear that.
>
> Thanks.
>
> —— from corosync.log ————————————————————————————————
> cib: error: cib_perform_op: Discarding update with feature set '3.0.11' greater than our own '3.0.10'
This implies that the pacemaker versions are different on the two nodes.
Usually, when the pacemaker version changes, the feature set version
also changes, which means that it introduces new features that won't
work with older pacemaker versions.
Running a cluster with mixed pacemaker versions in such a case is
allowed, but only during a rolling upgrade. Once an older node leaves
the cluster for any reason, it will not be allowed to rejoin until it is
upgraded.
Removing the cib files won't help, since node-2 apparently does not
support node-3's pacemaker version.
If that's not the situation you are in, please give more details, as
this should not be possible otherwise.
> cib: error: cib_process_request: Completed cib_replace operation for section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, version=0.83.30)
> crmd: error: finalize_sync_callback: Sync from node-3 failed: Protocol not supported
> crmd: info: register_fsa_error_adv: Resetting the current action list
> crmd: warning: do_log: Input I_ELECTION_DC received in state S_FINALIZE_JOIN from finalize_sync_callback
> crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback
> crmd: info: crm_update_peer_join: initialize_join: Node node-2[1] - join-6329 phase 2 -> 0
> crmd: info: crm_update_peer_join: initialize_join: Node node-3[2] - join-6329 phase 2 -> 0
> crmd: info: update_dc: Unset DC. Was node-2
> crmd: info: join_make_offer: join-6329: Sending offer to node-2
> crmd: info: crm_update_peer_join: join_make_offer: Node node-2[1] - join-6329 phase 0 -> 1
> crmd: info: join_make_offer: join-6329: Sending offer to node-3
> crmd: info: crm_update_peer_join: join_make_offer: Node node-3[2] - join-6329 phase 0 -> 1
> crmd: info: do_dc_join_offer_all: join-6329: Waiting on 2 outstanding join acks
> crmd: info: update_dc: Set DC to node-2 (3.0.10)
> crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node node-2[1] - join-6329 phase 1 -> 2
> crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node node-3[2] - join-6329 phase 1 -> 2
> crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state
> crmd: info: crmd_join_phase_log: join-6329: node-2=integrated
> crmd: info: crmd_join_phase_log: join-6329: node-3=integrated
> crmd: notice: do_dc_join_finalize: Syncing the Cluster Information Base from node-3 to rest of cluster | join-6329
> crmd: notice: do_dc_join_finalize: Requested version <generation_tuple crm_feature_set="3.0.11" validate-with="pacemaker-2.5" epoch="84" num_updates="1" admin_epoch="0" cib-last-written="Thu May 11 08:05:45 2017" update-origin="node-2" update-client="crm_resource" update-user="root" have-quorum="1"/>
> cib: info: cib_process_request: Forwarding cib_sync operation for section 'all' to node-3 (origin=local/crmd/12710)
> cib: info: cib_process_replace: Digest matched on replace from node-3: 85a19c7927c54ccb15794f2720e07ce1
> cib: info: cib_process_replace: Replaced 0.83.30 with 0.84.1 from node-3
> cib: info: __xml_diff_object: Moved node_state at crmd (3 -> 2)
More information about the Users
mailing list