<html><head></head><body><div class="ydp245004a8yahoo-style-wrap" style="font-family: courier new, courier, monaco, monospace, sans-serif; font-size: 16px;"><div></div>
<div dir="ltr" data-setdir="false">Hi Andrei,</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">don't trust Azure so much :D . I've seen stuff that was way more unbelievable.</div><div dir="ltr" data-setdir="false">Can you check other systems in the same subnet reported any issues. Yet, pcs most probably won't report any short-term issues. I have noticed that RHEL7 defaults for token and consensus are quite small and any short-term disruption could cause an issue. </div><div dir="ltr" data-setdir="false">Actually when I tested live migration on oVirt - the other hosts fenced the node that was migrated.</div><div dir="ltr" data-setdir="false">What is your corosync config and OS version ?</div><div dir="ltr" data-setdir="false"><br></div><div dir="ltr" data-setdir="false">Best Regards,</div><div dir="ltr" data-setdir="false">Strahil Nikolov</div><div><br></div>
</div><div id="ydp2ed250c4yahoo_quoted_1368837609" class="ydp2ed250c4yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
В четвъртък, 6 февруари 2020 г., 01:44:55 ч. Гринуич+2, Eric Robinson <eric.robinson@psmnv.com> написа:
</div>
<div><br></div>
<div><br></div>
<div><div id="ydp2ed250c4yiv0112698999"><div>
<div class="ydp2ed250c4yiv0112698999WordSection1">
<p class="ydp2ed250c4yiv0112698999MsoNormal">Hi Strahil –</p>
<p class="ydp2ed250c4yiv0112698999MsoNormal"> </p>
<p class="ydp2ed250c4yiv0112698999MsoNormal">I can’t prove there was no network loss, but:</p>
<p class="ydp2ed250c4yiv0112698999MsoNormal"> </p>
<ol style="margin-top:0in;" start="1" type="1"><li class="ydp2ed250c4yiv0112698999MsoListParagraph" style="margin-left:0in;">There were no dmesg indications of ethernet link loss.</li><li class="ydp2ed250c4yiv0112698999MsoListParagraph" style="margin-left:0in;">Other than corosync, there are no other log messages about connectivity issues.</li><li class="ydp2ed250c4yiv0112698999MsoListParagraph" style="margin-left:0in;">Wouldn’t pcsd say something about connectivity loss?</li><li class="ydp2ed250c4yiv0112698999MsoListParagraph" style="margin-left:0in;">Both servers are in Azure.</li><li class="ydp2ed250c4yiv0112698999MsoListParagraph" style="margin-left:0in;">There are many other servers in the same Azure subscription, including other corosync clusters, none of which had issues.</li></ol>
<p class="ydp2ed250c4yiv0112698999MsoNormal"> </p>
<p class="ydp2ed250c4yiv0112698999MsoNormal">So I guess it’s possible, but it seems unlikely. </p>
<p class="ydp2ed250c4yiv0112698999MsoNormal"> </p>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal">--Eric</p>
</div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"> </p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt;">
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in;">
<p class="ydp2ed250c4yiv0112698999MsoNormal"><b>From:</b> Users <users-bounces@clusterlabs.org> <b>On Behalf Of
</b>Strahil Nikolov<br clear="none">
<b>Sent:</b> Wednesday, February 5, 2020 3:13 PM<br clear="none">
<b>To:</b> Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org>; Andrei Borzenkov <arvidjaar@gmail.com><br clear="none">
<b>Subject:</b> Re: [ClusterLabs] Why Do Nodes Leave the Cluster?</p>
</div>
</div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"> </p>
<div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:12.0pt;">Hi Erik,</span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:12.0pt;"> </span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:12.0pt;">what has led you to think that there was no network loss ?</span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:12.0pt;"> </span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:12.0pt;">Best Regards,</span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:12.0pt;">Strahil Nikolov</span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:12.0pt;"> </span></p>
</div>
</div>
<div id="ydp2ed250c4yiv0112698999ydp16f6cb4ayahoo_quoted_0998543261">
<div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:10.0pt;">В сряда, 5 февруари 2020 г., 22:59:56 ч. Гринуич+2, Eric Robinson <</span><a shape="rect" href="mailto:eric.robinson@psmnv.com" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">eric.robinson@psmnv.com</span></a><span style="font-size:10.0pt;">>
написа: </span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:10.0pt;"> </span></p>
</div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:10.0pt;"> </span></p>
</div>
<div>
<div>
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:10.0pt;"><br clear="none">
> -----Original Message-----<br clear="none">
> From: Users <</span><a shape="rect" href="mailto:users-bounces@clusterlabs.org" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">users-bounces@clusterlabs.org</span></a><span style="font-size:10.0pt;">>
On Behalf Of Strahil Nikolov<br clear="none">
> Sent: Wednesday, February 5, 2020 1:59 PM<br clear="none">
> To: Andrei Borzenkov <</span><a shape="rect" href="mailto:arvidjaar@gmail.com" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">arvidjaar@gmail.com</span></a><span style="font-size:10.0pt;">>;
</span><a shape="rect" href="mailto:users@clusterlabs.org" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">users@clusterlabs.org</span></a><span style="font-size:10.0pt;"><br clear="none">
> Subject: Re: [ClusterLabs] Why Do Nodes Leave the Cluster?<br clear="none">
><br clear="none">
> On February 5, 2020 8:14:06 PM GMT+02:00, Andrei Borzenkov<br clear="none">
> <</span><a shape="rect" href="mailto:arvidjaar@gmail.com" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">arvidjaar@gmail.com</span></a><span style="font-size:10.0pt;">> wrote:<br clear="none">
> >05.02.2020 20:55, Eric Robinson пишет:<br clear="none">
> >> The two servers 001db01a and 001db01b were up and responsive. Neither<br clear="none">
> >had been rebooted and neither were under heavy load. There's no<br clear="none">
> >indication in the logs of loss of network connectivity. Any ideas on<br clear="none">
> >why both nodes seem to think the other one is at fault?<br clear="none">
> ><br clear="none">
> >The very fact that nodes lost connection to each other *is* indication<br clear="none">
> >of network problems. Your logs start too late, after any problem<br clear="none">
> >already happened.<br clear="none">
> ><br clear="none">
> >><br clear="none">
> >> (Yes, it's a 2-node cluster without quorum. A 3-node cluster is not<br clear="none">
> >an option at this time.)<br clear="none">
> >><br clear="none">
> >> Log from 001db01a:<br clear="none">
> >><br clear="none">
> >> Feb 5 08:01:02 001db01a corosync[1306]: [TOTEM ] A processor failed,<br clear="none">
> >forming new configuration.<br clear="none">
> >> Feb 5 08:01:03 001db01a corosync[1306]: [TOTEM ] A new membership<br clear="none">
> >(10.51.14.33:960) was formed. Members left: 2<br clear="none">
> >> Feb 5 08:01:03 001db01a corosync[1306]: [TOTEM ] Failed to receive<br clear="none">
> >the leave message. failed: 2<br clear="none">
> >> Feb 5 08:01:03 001db01a attrd[1525]: notice: Node 001db01b state is<br clear="none">
> >now lost<br clear="none">
> >> Feb 5 08:01:03 001db01a attrd[1525]: notice: Removing all 001db01b<br clear="none">
> >attributes for peer loss<br clear="none">
> >> Feb 5 08:01:03 001db01a cib[1522]: notice: Node 001db01b state is<br clear="none">
> >now lost<br clear="none">
> >> Feb 5 08:01:03 001db01a cib[1522]: notice: Purged 1 peer with id=2<br clear="none">
> >and/or uname=001db01b from the membership cache<br clear="none">
> >> Feb 5 08:01:03 001db01a attrd[1525]: notice: Purged 1 peer with<br clear="none">
> >id=2 and/or uname=001db01b from the membership cache<br clear="none">
> >> Feb 5 08:01:03 001db01a crmd[1527]: warning: No reason to expect<br clear="none">
> >node 2 to be down<br clear="none">
> >> Feb 5 08:01:03 001db01a stonith-ng[1523]: notice: Node 001db01b<br clear="none">
> >state is now lost<br clear="none">
> >> Feb 5 08:01:03 001db01a crmd[1527]: notice: Stonith/shutdown of<br clear="none">
> >001db01b not matched<br clear="none">
> >> Feb 5 08:01:03 001db01a corosync[1306]: [QUORUM] Members[1]: 1 Feb<br clear="none">
> >> 5 08:01:03 001db01a corosync[1306]: [MAIN ] Completed service<br clear="none">
> >synchronization, ready to provide service.<br clear="none">
> >> Feb 5 08:01:03 001db01a stonith-ng[1523]: notice: Purged 1 peer<br clear="none">
> >with id=2 and/or uname=001db01b from the membership cache<br clear="none">
> >> Feb 5 08:01:03 001db01a pacemakerd[1491]: notice: Node 001db01b<br clear="none">
> >state is now lost<br clear="none">
> >> Feb 5 08:01:03 001db01a crmd[1527]: notice: State transition S_IDLE<br clear="none">
> >-> S_POLICY_ENGINE<br clear="none">
> >> Feb 5 08:01:03 001db01a crmd[1527]: notice: Node 001db01b state is<br clear="none">
> >now lost<br clear="none">
> >> Feb 5 08:01:03 001db01a crmd[1527]: warning: No reason to expect<br clear="none">
> >node 2 to be down<br clear="none">
> >> Feb 5 08:01:03 001db01a crmd[1527]: notice: Stonith/shutdown of<br clear="none">
> >001db01b not matched<br clear="none">
> >> Feb 5 08:01:03 001db01a pengine[1526]: notice: On loss of CCM<br clear="none">
> >Quorum: Ignore<br clear="none">
> >><br clear="none">
> >> From 001db01b:<br clear="none">
> >><br clear="none">
> >> Feb 5 08:01:03 001db01b corosync[1455]: [TOTEM ] A new membership<br clear="none">
> >(10.51.14.34:960) was formed. Members left: 1<br clear="none">
> >> Feb 5 08:01:03 001db01b crmd[1693]: notice: Our peer on the DC<br clear="none">
> >(001db01a) is dead<br clear="none">
> >> Feb 5 08:01:03 001db01b stonith-ng[1689]: notice: Node 001db01a<br clear="none">
> >state is now lost<br clear="none">
> >> Feb 5 08:01:03 001db01b corosync[1455]: [TOTEM ] Failed to receive<br clear="none">
> >the leave message. failed: 1<br clear="none">
> >> Feb 5 08:01:03 001db01b corosync[1455]: [QUORUM] Members[1]: 2 Feb<br clear="none">
> >> 5 08:01:03 001db01b corosync[1455]: [MAIN ] Completed service<br clear="none">
> >synchronization, ready to provide service.<br clear="none">
> >> Feb 5 08:01:03 001db01b stonith-ng[1689]: notice: Purged 1 peer<br clear="none">
> >with id=1 and/or uname=001db01a from the membership cache<br clear="none">
> >> Feb 5 08:01:03 001db01b pacemakerd[1678]: notice: Node 001db01a<br clear="none">
> >state is now lost<br clear="none">
> >> Feb 5 08:01:03 001db01b crmd[1693]: notice: State transition<br clear="none">
> >S_NOT_DC -> S_ELECTION<br clear="none">
> >> Feb 5 08:01:03 001db01b crmd[1693]: notice: Node 001db01a state is<br clear="none">
> >now lost<br clear="none">
> >> Feb 5 08:01:03 001db01b attrd[1691]: notice: Node 001db01a state is<br clear="none">
> >now lost<br clear="none">
> >> Feb 5 08:01:03 001db01b attrd[1691]: notice: Removing all 001db01a<br clear="none">
> >attributes for peer loss<br clear="none">
> >> Feb 5 08:01:03 001db01b attrd[1691]: notice: Lost attribute writer<br clear="none">
> >001db01a<br clear="none">
> >> Feb 5 08:01:03 001db01b attrd[1691]: notice: Purged 1 peer with<br clear="none">
> >id=1 and/or uname=001db01a from the membership cache<br clear="none">
> >> Feb 5 08:01:03 001db01b crmd[1693]: notice: State transition<br clear="none">
> >S_ELECTION -> S_INTEGRATION<br clear="none">
> >> Feb 5 08:01:03 001db01b cib[1688]: notice: Node 001db01a state is<br clear="none">
> >now lost<br clear="none">
> >> Feb 5 08:01:03 001db01b cib[1688]: notice: Purged 1 peer with id=1<br clear="none">
> >and/or uname=001db01a from the membership cache<br clear="none">
> >> Feb 5 08:01:03 001db01b stonith-ng[1689]: notice: [cib_diff_notify]<br clear="none">
> >Patch aborted: Application of an update diff failed (-206)<br clear="none">
> >> Feb 5 08:01:03 001db01b crmd[1693]: warning: Input I_ELECTION_DC<br clear="none">
> >received in state S_INTEGRATION from do_election_check<br clear="none">
> >> Feb 5 08:01:03 001db01b pengine[1692]: notice: On loss of CCM<br clear="none">
> >Quorum: Ignore<br clear="none">
> >><br clear="none">
> >><br clear="none">
> >> -Eric<br clear="none">
> >><br clear="none">
> >><br clear="none">
> >><br clear="none">
> >> Disclaimer : This email and any files transmitted with it are<br clear="none">
> >confidential and intended solely for intended recipients. If you are<br clear="none">
> >not the named addressee you should not disseminate, distribute, copy or<br clear="none">
> >alter this email. Any views or opinions presented in this email are<br clear="none">
> >solely those of the author and might not represent those of Physician<br clear="none">
> >Select Management. Warning: Although Physician Select Management has<br clear="none">
> >taken reasonable precautions to ensure no viruses are present in this<br clear="none">
> >email, the company cannot accept responsibility for any loss or damage<br clear="none">
> >arising from the use of this email or attachments.<br clear="none">
> >><br clear="none">
> >><br clear="none">
> >> _______________________________________________<br clear="none">
> >> Manage your subscription:<br clear="none">
> >> </span><a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://lists.clusterlabs.org/mailman/listinfo/users</span></a><span style="font-size:10.0pt;"><br clear="none">
> >><br clear="none">
> >> ClusterLabs home: </span><a shape="rect" href="https://www.clusterlabs.org/" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://www.clusterlabs.org/</span></a><span style="font-size:10.0pt;"><br clear="none">
> >><br clear="none">
> ><br clear="none">
> >_______________________________________________<br clear="none">
> >Manage your subscription:<br clear="none">
> ></span><a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://lists.clusterlabs.org/mailman/listinfo/users</span></a><span style="font-size:10.0pt;"><br clear="none">
> ><br clear="none">
> >ClusterLabs home: </span><a shape="rect" href="https://www.clusterlabs.org/" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://www.clusterlabs.org/</span></a><span style="font-size:10.0pt;"><br clear="none">
><br clear="none">
> Hi Eric,<br clear="none">
> Do you use 2 corosync rings (routed via separare switches) ?<br clear="none">
><br clear="none">
<br clear="none">
I've done that with all my other clusters, but these two servers are in Azure, so the network is out of our control.<br clear="none">
<br clear="none">
> If not, you can easily set them up without downtime.<br clear="none">
><br clear="none">
> Also, are you using multicast or unicast ?<br clear="none">
><br clear="none">
<br clear="none">
Unicast, as Azure does not support multicast.<br clear="none">
<br clear="none">
> If 3rd node is not an option, you can check if your version is supporting<br clear="none">
> 'qdevice' which can be on a separate network and requires very low<br clear="none">
> resources - a simple VM will be enough.<br clear="none">
<br clear="none">
Thanks for the tip. I looked into qdevice years ago but it didn't seem mature at the time. I appreciate the reminder. I will pop over there and investigate!<br clear="none">
<br clear="none">
><br clear="none">
> Best Regards,<br clear="none">
> Strahil Nikolov<br clear="none">
> _______________________________________________<br clear="none">
> Manage your subscription:<br clear="none">
> </span><a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://lists.clusterlabs.org/mailman/listinfo/users</span></a><span style="font-size:10.0pt;"><br clear="none">
><br clear="none">
> ClusterLabs home: </span><a shape="rect" href="https://www.clusterlabs.org/" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://www.clusterlabs.org/</span></a><span style="font-size:10.0pt;"></span></p><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd69893"></div><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd85843">
<div id="ydp2ed250c4yiv0112698999ydp16f6cb4ayqtfd76253">
<p class="ydp2ed250c4yiv0112698999MsoNormal"><span style="font-size:10.0pt;"><br clear="none">
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this
email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility
for any loss or damage arising from the use of this email or attachments.<br clear="none">
_______________________________________________<br clear="none">
Manage your subscription:<br clear="none">
</span><a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://lists.clusterlabs.org/mailman/listinfo/users</span></a><span style="font-size:10.0pt;"><br clear="none">
<br clear="none">
ClusterLabs home: </span><a shape="rect" href="https://www.clusterlabs.org/" rel="nofollow" target="_blank"><span style="font-size:10.0pt;">https://www.clusterlabs.org/</span></a><span style="font-size:10.0pt;"></span></p>
</div>
</div></div><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd79948">
</div></div><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd99061">
</div></div><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd34322">
</div></div><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd31251">
</div></div><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd24205">
</div></div><div class="ydp2ed250c4yiv0112698999yqt2147044959" id="ydp2ed250c4yiv0112698999yqtfd03813">
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this
email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility
for any loss or damage arising from the use of this email or attachments.
</div></div></div><div class="ydp2ed250c4yqt2147044959" id="ydp2ed250c4yqtfd60317">_______________________________________________<br clear="none">Manage your subscription:<br clear="none"><a shape="rect" href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="nofollow" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br clear="none"><br clear="none">ClusterLabs home: <a shape="rect" href="https://www.clusterlabs.org/" rel="nofollow" target="_blank">https://www.clusterlabs.org/</a></div></div>
</div>
</div></body></html>