<div dir="ltr">Hi Christine - Thanks for looking into the logs.<div>I also see that the node eventually comes out of GATHER state here:</div><div>
<pre style="color:rgb(0,0,0);text-decoration-style:initial;text-decoration-color:initial;word-wrap:break-word;white-space:pre-wrap">Jun 07 16:56:10 corosync [TOTEM ] entering GATHER state from 0.
Jun 07 16:56:10 corosync [TOTEM ] Creating commit token because I am the rep.</pre>
Does it mean, it has timed out or given up and then came out ?</div><div><br></div><div>second point, I did see some unexpected entries when I did tcpdump on the node coro.4.. [ Its also pasted in one of the earlier threads] You can see that it was receiving messages likeĀ </div><div>
<pre style="color:rgb(0,0,0);text-decoration-style:initial;text-decoration-color:initial;word-wrap:break-word;white-space:pre-wrap"><pre style="text-decoration-style:initial;text-decoration-color:initial">10:23:17.117347 IP 172.22.0.13.50468 > 172.22.0.4.netsupport: UDP, length
332
10:23:17.140960 IP 172.22.0.8.50438 > 172.22.0.4.netsupport: UDP, length 82
10:23:17.141319 IP 172.22.0.6.38535 > 172.22.0.4.netsupport: UDP, length 156</pre>
Please note that 172.22.0.8 and 172.22.0.6 are not part of my group and I was wondering why these messages are coming ?</pre><div class="gmail_extra">Thanks!</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 8, 2018 at 2:34 PM, Christine Caulfield <span dir="ltr"><<a href="mailto:ccaulfie@redhat.com" target="_blank">ccaulfie@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 07/06/18 18:32, Prasad Nagaraj wrote:<br>
> Hi Christine - Got it:)<br>
> <br>
> I have collected few seconds of debug logs from all nodes after startup.<br>
> Please find them attached.<br>
> Please let me know if this will help us to identify rootcause.<br>
> <br>
<br>
</span>The problem is on the node coro.4 - it never gets out of the JOIN<br>
<br>
"Jun 07 16:55:37 corosync [TOTEM ] entering GATHER state from 11."<br>
<br>
process so something is wrong on that node, either a rogue routing table<br>
entry, dangling iptables rule or even a broken NIC.<br>
<br>
Chrissie<br>
<span class=""><br></span></blockquote></div></div></div></div>