[ClusterLabs] Why Do Nodes Leave the Cluster?
Eric Robinson
eric.robinson at psmnv.com
Wed Feb 5 15:55:25 EST 2020
> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Andrei
> Borzenkov
> Sent: Wednesday, February 5, 2020 12:14 PM
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Why Do Nodes Leave the Cluster?
>
> 05.02.2020 20:55, Eric Robinson пишет:
> > The two servers 001db01a and 001db01b were up and responsive. Neither
> had been rebooted and neither were under heavy load. There's no indication
> in the logs of loss of network connectivity. Any ideas on why both nodes
> seem to think the other one is at fault?
>
> The very fact that nodes lost connection to each other *is* indication of
> network problems. Your logs start too late, after any problem already
> happened.
>
All the log messages before those are just normal repetitive stuff that always gets logged, even during normal production. The snippet I provided shows the first indication of anything unusual. Also, there is no other indication of network connectivity loss, and both servers are in Azure.
> >
> > (Yes, it's a 2-node cluster without quorum. A 3-node cluster is not an
> > option at this time.)
> >
> > Log from 001db01a:
> >
> > Feb 5 08:01:02 001db01a corosync[1306]: [TOTEM ] A processor failed,
> forming new configuration.
> > Feb 5 08:01:03 001db01a corosync[1306]: [TOTEM ] A new membership
> > (10.51.14.33:960) was formed. Members left: 2 Feb 5 08:01:03 001db01a
> > corosync[1306]: [TOTEM ] Failed to receive the leave message. failed:
> > 2 Feb 5 08:01:03 001db01a attrd[1525]: notice: Node 001db01b state
> > is now lost Feb 5 08:01:03 001db01a attrd[1525]: notice: Removing
> > all 001db01b attributes for peer loss Feb 5 08:01:03 001db01a
> > cib[1522]: notice: Node 001db01b state is now lost Feb 5 08:01:03
> > 001db01a cib[1522]: notice: Purged 1 peer with id=2 and/or
> > uname=001db01b from the membership cache Feb 5 08:01:03 001db01a
> > attrd[1525]: notice: Purged 1 peer with id=2 and/or uname=001db01b
> > from the membership cache Feb 5 08:01:03 001db01a crmd[1527]:
> > warning: No reason to expect node 2 to be down Feb 5 08:01:03 001db01a
> stonith-ng[1523]: notice: Node 001db01b state is now lost Feb 5 08:01:03
> 001db01a crmd[1527]: notice: Stonith/shutdown of 001db01b not matched
> Feb 5 08:01:03 001db01a corosync[1306]: [QUORUM] Members[1]: 1 Feb 5
> 08:01:03 001db01a corosync[1306]: [MAIN ] Completed service
> synchronization, ready to provide service.
> > Feb 5 08:01:03 001db01a stonith-ng[1523]: notice: Purged 1 peer with
> > id=2 and/or uname=001db01b from the membership cache Feb 5 08:01:03
> > 001db01a pacemakerd[1491]: notice: Node 001db01b state is now lost
> > Feb 5 08:01:03 001db01a crmd[1527]: notice: State transition S_IDLE
> > -> S_POLICY_ENGINE Feb 5 08:01:03 001db01a crmd[1527]: notice: Node
> > 001db01b state is now lost Feb 5 08:01:03 001db01a crmd[1527]:
> > warning: No reason to expect node 2 to be down Feb 5 08:01:03
> > 001db01a crmd[1527]: notice: Stonith/shutdown of 001db01b not matched
> > Feb 5 08:01:03 001db01a pengine[1526]: notice: On loss of CCM
> > Quorum: Ignore
> >
> > From 001db01b:
> >
> > Feb 5 08:01:03 001db01b corosync[1455]: [TOTEM ] A new membership
> > (10.51.14.34:960) was formed. Members left: 1 Feb 5 08:01:03 001db01b
> > crmd[1693]: notice: Our peer on the DC (001db01a) is dead Feb 5
> > 08:01:03 001db01b stonith-ng[1689]: notice: Node 001db01a state is
> > now lost Feb 5 08:01:03 001db01b corosync[1455]: [TOTEM ] Failed to
> > receive the leave message. failed: 1 Feb 5 08:01:03 001db01b
> corosync[1455]: [QUORUM] Members[1]: 2 Feb 5 08:01:03 001db01b
> corosync[1455]: [MAIN ] Completed service synchronization, ready to
> provide service.
> > Feb 5 08:01:03 001db01b stonith-ng[1689]: notice: Purged 1 peer with
> > id=1 and/or uname=001db01a from the membership cache Feb 5 08:01:03
> > 001db01b pacemakerd[1678]: notice: Node 001db01a state is now lost
> > Feb 5 08:01:03 001db01b crmd[1693]: notice: State transition
> > S_NOT_DC -> S_ELECTION Feb 5 08:01:03 001db01b crmd[1693]: notice:
> > Node 001db01a state is now lost Feb 5 08:01:03 001db01b attrd[1691]:
> > notice: Node 001db01a state is now lost Feb 5 08:01:03 001db01b
> > attrd[1691]: notice: Removing all 001db01a attributes for peer loss
> > Feb 5 08:01:03 001db01b attrd[1691]: notice: Lost attribute writer
> > 001db01a Feb 5 08:01:03 001db01b attrd[1691]: notice: Purged 1 peer
> > with id=1 and/or uname=001db01a from the membership cache Feb 5
> > 08:01:03 001db01b crmd[1693]: notice: State transition S_ELECTION ->
> > S_INTEGRATION Feb 5 08:01:03 001db01b cib[1688]: notice: Node
> > 001db01a state is now lost Feb 5 08:01:03 001db01b cib[1688]:
> > notice: Purged 1 peer with id=1 and/or uname=001db01a from the
> > membership cache Feb 5 08:01:03 001db01b stonith-ng[1689]: notice:
> > [cib_diff_notify] Patch aborted: Application of an update diff failed
> > (-206) Feb 5 08:01:03 001db01b crmd[1693]: warning: Input
> > I_ELECTION_DC received in state S_INTEGRATION from do_election_check
> > Feb 5 08:01:03 001db01b pengine[1692]: notice: On loss of CCM
> > Quorum: Ignore
> >
> >
> > -Eric
> >
> >
> >
> > Disclaimer : This email and any files transmitted with it are confidential and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure
> no viruses are present in this email, the company cannot accept responsibility
> for any loss or damage arising from the use of this email or attachments.
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
More information about the Users
mailing list