[ClusterLabs] corosync not able to form cluster
Christine Caulfield
ccaulfie at redhat.com
Fri Jun 8 05:04:23 EDT 2018
On 07/06/18 18:32, Prasad Nagaraj wrote:
> Hi Christine - Got it:)
>
> I have collected few seconds of debug logs from all nodes after startup.
> Please find them attached.
> Please let me know if this will help us to identify rootcause.
>
The problem is on the node coro.4 - it never gets out of the JOIN
"Jun 07 16:55:37 corosync [TOTEM ] entering GATHER state from 11."
process so something is wrong on that node, either a rogue routing table
entry, dangling iptables rule or even a broken NIC.
Chrissie
> Thanks!
>
> On Thu, Jun 7, 2018 at 8:43 PM, Christine Caulfield <ccaulfie at redhat.com
> <mailto:ccaulfie at redhat.com>> wrote:
>
> On 07/06/18 15:53, Prasad Nagaraj wrote:
> > Hi - As you can see in the corosync.conf details - i have already kept
> > debug: on
> >
>
> But only in the (disabled) AMF subsystem, not for corosync as a whole :)
>
> logger_subsys {
> subsys: AMF
> debug: on
> }
>
>
> Chrissie
>
>
> >
> > On Thu, 7 Jun 2018, 8:03 pm Christine Caulfield, <ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>
> > <mailto:ccaulfie at redhat.com <mailto:ccaulfie at redhat.com>>> wrote:
> >
> > On 07/06/18 15:24, Prasad Nagaraj wrote:
> > >
> > > No iptables or otherwise firewalls are setup on these nodes.
> > >
> > > One observation is that each node sends messages on with its
> own ring
> > > sequence number which is not converging.. I have seen that
> in a good
> > > cluster, when nodes respond with same sequence number, the
> > membership is
> > > automatically formed. But in our case, that is not the case.
> > >
> >
> > That's just a side-effect of the cluster not forming. It's not
> causing
> > it. Can you enable full corosync debugging (just add debug:on
> to the end
> > of the logging {} stanza) and see if that has any more useful
> > information (I only need the corosync bits, not the pcmk ones)
> >
> > Chrissie
> >
> > > Example: we can see that one node sends
> > > Jun 07 07:55:04 corosync [pcmk ] notice: pcmk_peer_update:
> > Transitional
> > > membership event on ring 71084: memb=1, new=0, lost=0
> > > .....
> > > Jun 07 07:55:16 corosync [pcmk ] notice: pcmk_peer_update:
> > Transitional
> > > membership event on ring 71096: memb=1, new=0, lost=0
> > > Jun 07 07:55:16 corosync [pcmk ] notice: pcmk_peer_update:
> Stable
> > > membership event on ring 71096: memb=1, new=0, lost=0
> > >
> > > other node sends messages with its own numbers
> > > Jun 07 07:55:12 corosync [pcmk ] notice: pcmk_peer_update:
> > Transitional
> > > membership event on ring 71088: memb=1, new=0, lost=0
> > > Jun 07 07:55:12 corosync [pcmk ] notice: pcmk_peer_update:
> Stable
> > > membership event on ring 71088: memb=1, new=0, lost=0
> > > .......
> > > Jun 07 07:55:24 corosync [pcmk ] notice: pcmk_peer_update:
> > Transitional
> > > membership event on ring 71100: memb=1, new=0, lost=0
> > > Jun 07 07:55:24 corosync [pcmk ] notice: pcmk_peer_update:
> Stable
> > > membership event on ring 71100: memb=1, new=0, lost=0
> > >
> > > Any idea why this happens, and why the seq. numbers from
> different
> > nodes
> > > are not converging ?
> > >
> > > Thanks!
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> > <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> > > Bugs: http://bugs.clusterlabs.org
> > >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org> <mailto:Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>>
> > https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> > https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
> https://lists.clusterlabs.org/mailman/listinfo/users
> <https://lists.clusterlabs.org/mailman/listinfo/users>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list