[ClusterLabs] both nodes OFFLINE

Tue May 23 14:36:18 UTC 2017

Hi.

Thanks for reply. And sorry for my report that problem has solved.
As mentioned, corosync versions were not same. “Syncing” versions solved the problem.
This was just an installation problem. Although we used Ansible to update the rpm file,
there was a failure and we missed it happend. 

> 2017/05/23 7:12、Ken Gaillot <kgaillot at redhat.com>のメール:
> 
> On 05/13/2017 01:36 AM, 石井 俊直 wrote:
>> Hi.
>> 
>> We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 and node-3
>> be the names of the nodes. When the problem happens, both nodes are recognized OFFLINE
>> on node-3 and on node-2, only node-3 is recognized OFFLINE.
>> 
>> When that happens, the following log message is added repeatedly on node-2 and log file
>> (/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. Log message
>> content on node-3 is different.
>> 
>> The erroneous state is temporally solved if OS of node-2 is restarted. On the other hand,
>> restarting OS of node-3 results in the same state.
>> 
>> I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) about
>> "Discarding update with feature set” problem. According to the message, our problem
>> may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2.
>> 
>> What I want to know is whether removing the above files on just one of the node is safe ?
>> If there’s other method to solve the problem, I’d like to hear that.
>> 
>> Thanks.
>> 
>> —— from corosync.log ———————————————————————————————— 
>> cib:    error: cib_perform_op:	Discarding update with feature set '3.0.11' greater than our own '3.0.10'
> 
> This implies that the pacemaker versions are different on the two nodes.
> Usually, when the pacemaker version changes, the feature set version
> also changes, which means that it introduces new features that won't
> work with older pacemaker versions.
> 
> Running a cluster with mixed pacemaker versions in such a case is
> allowed, but only during a rolling upgrade. Once an older node leaves
> the cluster for any reason, it will not be allowed to rejoin until it is
> upgraded.
> 
> Removing the cib files won't help, since node-2 apparently does not
> support node-3's pacemaker version.
> 
> If that's not the situation you are in, please give more details, as
> this should not be possible otherwise.
> 
>> cib:    error: cib_process_request:	Completed cib_replace operation for section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, version=0.83.30)
>> crmd:   error: finalize_sync_callback:	Sync from node-3 failed: Protocol not supported
>> crmd:    info: register_fsa_error_adv:	Resetting the current action list
>> crmd: warning: do_log:	Input I_ELECTION_DC received in state S_FINALIZE_JOIN from finalize_sync_callback
>> crmd:    info: do_state_transition:	State transition S_FINALIZE_JOIN -> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback
>> crmd:    info: crm_update_peer_join:	initialize_join: Node node-2[1] - join-6329 phase 2 -> 0
>> crmd:    info: crm_update_peer_join:	initialize_join: Node node-3[2] - join-6329 phase 2 -> 0
>> crmd:    info: update_dc:	Unset DC. Was node-2
>> crmd:    info: join_make_offer:	join-6329: Sending offer to node-2
>> crmd:    info: crm_update_peer_join:	join_make_offer: Node node-2[1] - join-6329 phase 0 -> 1
>> crmd:    info: join_make_offer:	join-6329: Sending offer to node-3
>> crmd:    info: crm_update_peer_join:	join_make_offer: Node node-3[2] - join-6329 phase 0 -> 1
>> crmd:    info: do_dc_join_offer_all:	join-6329: Waiting on 2 outstanding join acks
>> crmd:    info: update_dc:	Set DC to node-2 (3.0.10)
>> crmd:    info: crm_update_peer_join:	do_dc_join_filter_offer: Node node-2[1] - join-6329 phase 1 -> 2
>> crmd:    info: crm_update_peer_join:	do_dc_join_filter_offer: Node node-3[2] - join-6329 phase 1 -> 2
>> crmd:    info: do_state_transition:	State transition S_INTEGRATION -> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state
>> crmd:    info: crmd_join_phase_log:	join-6329: node-2=integrated
>> crmd:    info: crmd_join_phase_log:	join-6329: node-3=integrated
>> crmd:  notice: do_dc_join_finalize:	Syncing the Cluster Information Base from node-3 to rest of cluster | join-6329
>> crmd:  notice: do_dc_join_finalize:	Requested version   <generation_tuple crm_feature_set="3.0.11" validate-with="pacemaker-2.5" epoch="84" num_updates="1" admin_epoch="0" cib-last-written="Thu May 11 08:05:45 2017" update-origin="node-2" update-client="crm_resource" update-user="root" have-quorum="1"/>
>> cib:     info: cib_process_request:	Forwarding cib_sync operation for section 'all' to node-3 (origin=local/crmd/12710)
>> cib:     info: cib_process_replace:	Digest matched on replace from node-3: 85a19c7927c54ccb15794f2720e07ce1
>> cib:     info: cib_process_replace:	Replaced 0.83.30 with 0.84.1 from node-3
>> cib:     info: __xml_diff_object:	Moved node_state at crmd (3 -> 2)
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org