[ClusterLabs] cluster loses state (randomly) every few minutes.
Jan Friesse
jfriesse at redhat.com
Mon Jan 18 03:56:57 EST 2021
lejeczek,
> hi guys,
>
> I have a very basic two-node cluster, not even a single resource on it,
> but very troublesome - it keeps braking.
> Journal for 'pacemaker' shows constantly (on both nodes):
> ...
> warning: Input I_DC_TIMEOUT received in state S_PENDING from
> crm_timer_popped
> notice: State transition S_ELECTION -> S_PENDING
> notice: State transition S_PENDING -> S_NOT_DC
> notice: Lost attribute writer swir
> notice: Node swir state is now lost
> notice: Our peer on the DC (swir) is dead
> notice: Purged 1 peer with id=2 and/or uname=swir from the membership
> cache
> notice: Node swir state is now lost
> notice: State transition S_NOT_DC -> S_ELECTION
> notice: Removing all swir attributes for peer loss
> notice: Purged 1 peer with id=2 and/or uname=swir from the membership
> cache
> notice: Node swir state is now lost
> notice: Node swir state is now lost
> notice: Recorded local node as attribute writer (was unset)
> notice: Purged 1 peer with id=2 and/or uname=swir from the membership
> cache
> notice: State transition S_ELECTION -> S_INTEGRATION
> warning: Blind faith: not fencing unseen nodes
> notice: Delaying fencing operations until there are resources to manage
> notice: Calculated transition 0, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-627.bz2
> notice: Transition 0 (Complete=0, Pending=0, Fired=0, Skipped=0,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-627.bz2): Complete
> notice: State transition S_TRANSITION_ENGINE -> S_IDLE
> notice: Node swir state is now member
> notice: Node swir state is now member
> notice: Node swir state is now member
> notice: Node swir state is now member
> notice: State transition S_IDLE -> S_INTEGRATION
> warning: Another DC detected: swir (op=noop)
> notice: Detected another attribute writer (swir), starting new election
> notice: Setting #attrd-protocol[swir]: (unset) -> 2
> notice: State transition S_ELECTION -> S_RELEASE_DC
> notice: State transition S_PENDING -> S_NOT_DC
> notice: Recorded local node as attribute writer (was unset)
>
Is there anything interesting in corosync.log?
> It's the same hardware on which "this same" cluster ran okey and then,
> only a couple of days ago, I upgraded Centos on these two boxes to "Steam"
> I'm hoping it's something trivial I'm missing with new version(s) of
> software came with upgrace, perhaps some (new) settings for two-node
> cluster which I missed.
Actually for Corosync there is one - increase of token timeout to 3sec.
This was not a problem during my testing, but just for sure - have you
restarted corosync on both of the nodes? Do that have same token timeout
(you can check used token timeout by running "corosync-cmapctl -g
runtime.config.totem.token")?
Honza
> Any suggestions greatly appreciated.
> many thanks, L.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list