[ClusterLabs] corosync not able to form cluster

Prasad Nagaraj prasad.nagaraj76 at gmail.com
Thu Jun 7 17:32:19 UTC 2018


Hi Christine - Got it:)

I have collected few seconds of debug logs from all nodes after startup.
Please find them attached.
Please let me know if this will help us to identify rootcause.

Thanks!

On Thu, Jun 7, 2018 at 8:43 PM, Christine Caulfield <ccaulfie at redhat.com>
wrote:

> On 07/06/18 15:53, Prasad Nagaraj wrote:
> > Hi - As you can see in the corosync.conf details - i have already kept
> > debug: on
> >
>
> But only in the (disabled) AMF subsystem, not for corosync as a whole :)
>
>     logger_subsys {
>     subsys: AMF
>     debug: on
>     }
>
>
> Chrissie
>
>
> >
> > On Thu, 7 Jun 2018, 8:03 pm Christine Caulfield, <ccaulfie at redhat.com
> > <mailto:ccaulfie at redhat.com>> wrote:
> >
> >     On 07/06/18 15:24, Prasad Nagaraj wrote:
> >     >
> >     > No iptables or otherwise firewalls are setup on these nodes.
> >     >
> >     > One observation is that each node sends messages on with its own
> ring
> >     > sequence number which is not converging.. I have seen that in a
> good
> >     > cluster, when nodes respond with same sequence number, the
> >     membership is
> >     > automatically formed. But in our case, that is not the case.
> >     >
> >
> >     That's just a side-effect of the cluster not forming. It's not
> causing
> >     it. Can you enable full corosync debugging (just add debug:on to the
> end
> >     of the logging {} stanza) and see if that has any more useful
> >     information (I only need the corosync bits, not the pcmk ones)
> >
> >     Chrissie
> >
> >     > Example: we can see that one node sends
> >     > Jun 07 07:55:04 corosync [pcmk  ] notice: pcmk_peer_update:
> >     Transitional
> >     > membership event on ring 71084: memb=1, new=0, lost=0
> >     > .....
> >     > Jun 07 07:55:16 corosync [pcmk  ] notice: pcmk_peer_update:
> >     Transitional
> >     > membership event on ring 71096: memb=1, new=0, lost=0
> >     > Jun 07 07:55:16 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> >     > membership event on ring 71096: memb=1, new=0, lost=0
> >     >
> >     > other node sends messages with its own numbers
> >     > Jun 07 07:55:12 corosync [pcmk  ] notice: pcmk_peer_update:
> >     Transitional
> >     > membership event on ring 71088: memb=1, new=0, lost=0
> >     > Jun 07 07:55:12 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> >     > membership event on ring 71088: memb=1, new=0, lost=0
> >     > .......
> >     > Jun 07 07:55:24 corosync [pcmk  ] notice: pcmk_peer_update:
> >     Transitional
> >     > membership event on ring 71100: memb=1, new=0, lost=0
> >     > Jun 07 07:55:24 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> >     > membership event on ring 71100: memb=1, new=0, lost=0
> >     >
> >     > Any idea why this happens, and why the seq. numbers from different
> >     nodes
> >     > are not converging ?
> >     >
> >     > Thanks!
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > _______________________________________________
> >     > Users mailing list: Users at clusterlabs.org
> >     <mailto:Users at clusterlabs.org>
> >     > https://lists.clusterlabs.org/mailman/listinfo/users
> >     >
> >     > Project Home: http://www.clusterlabs.org
> >     > Getting started:
> >     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >     > Bugs: http://bugs.clusterlabs.org
> >     >
> >
> >     _______________________________________________
> >     Users mailing list: Users at clusterlabs.org <mailto:
> Users at clusterlabs.org>
> >     https://lists.clusterlabs.org/mailman/listinfo/users
> >
> >     Project Home: http://www.clusterlabs.org
> >     Getting started: http://www.clusterlabs.org/
> doc/Cluster_from_Scratch.pdf
> >     Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180607/a4b862a1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: coro.13.log
Type: application/octet-stream
Size: 212194 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180607/a4b862a1/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: coro.11.log
Type: application/octet-stream
Size: 229922 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180607/a4b862a1/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: coro.4.log
Type: application/octet-stream
Size: 174529 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180607/a4b862a1/attachment-0005.obj>


More information about the Users mailing list