<div dir="ltr">I can confirm that doing an ifdown is not the source of my corosync issues. My cluster is in another state, so I can't pull a cable, but I can down a port on a switch. That had the exact same affects as doing an ifdown. Two machines got fenced when it should have only been one.</div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>-------</div>Seth Reid<div>System Operations Engineer</div><div>Vendini, Inc.<br></div><div>415.349.7736</div><div><a href="mailto:sreid@vendini.com" target="_blank">sreid@vendini.com</a></div><div><a href="http://www.vendini.com" target="_blank">www.vendini.com</a></div><div><br></div></div></div></div>

<br><div class="gmail_quote">On Fri, Mar 31, 2017 at 4:12 AM, Dejan Muhamedagic <span dir="ltr"><<a href="mailto:dejanmm@fastmail.fm" target="_blank">dejanmm@fastmail.fm</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<div><div class="h5"><br>

On Fri, Mar 31, 2017 at 02:39:02AM -0400, Digimer wrote:<br>

> On 31/03/17 02:32 AM, Jan Friesse wrote:<br>

> >> The original message has the logs from nodes 1 and 3. Node 2, the one<br>

> >> that<br>

> >> got fenced in this test, doesn't really show much. Here are the logs from<br>

> >> it:<br>

> >><br>

> >> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #5 enp6s0f0,<br>

> >> 192.168.100.14#123, interface stats: received=0, sent=0, dropped=0,<br>

> >> active_time=3253 secs<br>

> >> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #7 enp6s0f0,<br>

> >> fe80::a236:9fff:fe8a:6500%6#<wbr>123, interface stats: received=0, sent=0,<br>

> >> dropped=0, active_time=3253 secs<br>

> >> Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] A processor failed,<br>

> >> forming new configuration.<br>

> >> Mar 24 16:35:13 b014 corosync[2166]:  [TOTEM ] A processor failed,<br>

> >> forming<br>

> >> new configuration.<br>

> >> Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] The network<br>

> >> interface<br>

> >> is down.<br>

> ><br>

> > This is problem. Corosync handles ifdown really badly. If this was not<br>

> > intentional it may be caused by NetworkManager. Then please install<br>

> > equivalent of NetworkManager-config-server package (it's actually one<br>

> > file called 00-server.conf so you can extract it from, for example,<br>

> > Fedora package<br>

> > <a href="https://www.rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/n/NetworkManager-config-server-1.8.0-0.1.fc27.noarch.html" rel="noreferrer" target="_blank">https://www.rpmfind.net/linux/<wbr>RPM/fedora/devel/rawhide/x86_<wbr>64/n/NetworkManager-config-<wbr>server-1.8.0-0.1.fc27.noarch.<wbr>html</a>)<br>

><br>

> ifdown'ing corosync's interface happens a lot, intentionally or<br>

> otherwise.<br>

<br>

</div></div>I'm not sure, but I think that it can happen only intentionally,<br>

i.e. through a human intervention. If there's another problem<br>

with the interface it doesn't disappear from the system.<br>

<br>

Thanks,<br>

<br>

Dejan<br>

<div class="HOEnZb"><div class="h5"><br>

> I think it is reasonable to expect corosync to handle this<br>

> properly. How hard would it be to make corosync resilient to this fault<br>

> case?<br>

><br>

> --<br>

> Digimer<br>

> Papers and Projects: <a href="https://alteeve.com/w/" rel="noreferrer" target="_blank">https://alteeve.com/w/</a><br>

> "I am, somehow, less interested in the weight and convolutions of<br>

> Einstein’s brain than in the near certainty that people of equal talent<br>

> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould<br>

><br>

> ______________________________<wbr>_________________<br>

> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

> <a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

><br>

> Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch.pdf</a><br>

> Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

<br>

______________________________<wbr>_________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/<wbr>mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/<wbr>doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

</div></div></blockquote></div><br></div>