[ClusterLabs] Corosync.log Growing at 1Gb in 15min
lars.ellenberg at linbit.com
Wed Jun 20 08:55:32 EDT 2018
On Wed, Jun 20, 2018 at 01:58:18PM +0200, Mark Prinsloo wrote:
> Good Day,
> We had a node failure and we had to rebuild the whole server from scratch.
> After rebuilding Asterisk01 and rejoining it to the cluster something isn't
> working 100%.
> It does fail over but the corosync.log file on the second node is growing
> at 1Gb every 15min.
> I have attached a piece of the log file and you can see the logs get filled
> with the same set of messages.
> Something isn't working properly but can figure out what exactly.
cib_perform_op: Discarding update with feature set '3.0.14' greater than our own '3.0.10'
What that tries to say is:
"The other node runs a sofware version that is too recent,
I fear I might not understand it, so I reject it."
You options, as far as I can see them:
- upgrade the older pacemaker to the same version as on the other node,
- or tell your CIB on the newer version to only use the "feature set"
up to 3.0.10, and only validate with whatever the older version of
pacemaker can handle.
To do that, unless you are actually using some incompatible recent
features from the feature set denoted by 3.0.14, which I think is unlikely,
you should be able to dump the cib (cibadmin -Q > tmp.xml),
edit the "crm_feature_set=3.0.14" and the "validate-with" to something
the older version can understand (e.g. 3.0.10 and pacemaker-2.0),
and "reimport" that (cibadmin -R -x tmp.xml).
Or something like that.
> Should I rather rebuild the cluster?
Maybe. If so, try to not mix pacemaker versions,
or at least have all nodes be present and connected first,
before your start configuring resources; if that is not possible,
activate the *oldest* pacemaker version first.
Currently you are in an "election / integration" loop.
Pacemaker should handle that situation in a more graceful way,
but unfortunately it does not, logging this in a busy loop
(info level stripped away):
> Jun 20 13:09:45 kpasterisk02 crmd: notice: do_dc_join_finalize: join-17178295: Syncing the CIB from kpasterisk01-ha to the rest of the cluster
> Jun 20 13:09:45 kpasterisk02 crmd: notice: do_dc_join_finalize: Requested version <generation_tuple crm_feature_set="3.0.14" validate-with="pacemaker-2.3" epoch="46" num_updates="1" admin_epoch="0" cib-last-written="Tue Jun 19 21:02:12 2018" update-origin="kpasterisk01-ha" update-client="crmd" update-user="hacluster" have-quorum="1"/>
> Jun 20 13:09:45 kpasterisk02 cib: error: cib_perform_op: Discarding update with feature set '3.0.14' greater than our own '3.0.10'
> Jun 20 13:09:45 kpasterisk02 cib: error: cib_process_request: Completed cib_replace operation for section 'all': Protocol not supported (rc=-93, origin=kpasterisk01-ha/crmd/###, version=0.45.44)
> Jun 20 13:09:45 kpasterisk02 crmd: error: finalize_sync_callback: Sync from kpasterisk01-ha failed: Protocol not supported
> Jun 20 13:09:45 kpasterisk02 crmd: warning: do_log: FSA: Input I_ELECTION_DC from finalize_sync_callback() received in state S_FINALIZE_JOIN
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support
DRBD® and LINBIT® are registered trademarks of LINBIT
More information about the Users