[ClusterLabs] [ClusterLab] : Corosync not initializing successfully

Dejan Muhamedagic dejanmm at fastmail.fm
Tue May 3 11:56:06 UTC 2016


Hi,

On Mon, May 02, 2016 at 08:54:09AM +0200, Jan Friesse wrote:
> >As your hardware is probably capable of running ppcle and if you have an
> >environment
> >at hand without too much effort it might pay off to try that.
> >There are of course distributions out there support corosync on
> >big-endian architectures
> >but I don't know if there is an automatized regression for corosync on
> >big-endian that
> >would catch big-endian-issues right away with something as current as
> >your 2.3.5.
> 
> No we are not testing big-endian.
> 
> So totally agree with Klaus. Give a try to ppcle. Also make sure all
> nodes are little-endian. Corosync should work in mixed BE/LE
> environment but because it's not tested, it may not work (and it's a
> bug, so if ppcle works I will try to fix BE).

I tested a cluster consisting of big endian/little endian nodes
(s390 and x86-64), but that was a while ago. IIRC, all relevant
bugs in corosync got fixed at that time. Don't know what is the
situation with the latest version.

Thanks,

Dejan

> Regards,
>   Honza
> 
> >
> >Regards,
> >Klaus
> >
> >On 05/02/2016 06:44 AM, Nikhil Utane wrote:
> >>Re-sending as I don't see my post on the thread.
> >>
> >>On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane
> >><nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>> wrote:
> >>
> >>     Hi,
> >>
> >>     Looking for some guidance here as we are completely blocked
> >>     otherwise :(.
> >>
> >>     -Regards
> >>     Nikhil
> >>
> >>     On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com
> >>     <mailto:sriram.ec at gmail.com>> wrote:
> >>
> >>         Corrected the subject.
> >>
> >>         We went ahead and captured corosync debug logs for our ppc board.
> >>         After log analysis and comparison with the sucessful logs(
> >>         from x86 machine) ,
> >>         we didnt find *"[ MAIN  ] Completed service synchronization,
> >>         ready to provide service.*" in ppc logs.
> >>         So, looks like corosync is not in a position to accept
> >>         connection from Pacemaker.
> >>         Even I tried with the new corosync.conf with no success.
> >>
> >>         Any hints on this issue would be really helpful.
> >>
> >>         Attaching ppc_notworking.log, x86_working.log, corosync.conf.
> >>
> >>         Regards,
> >>         Sriram
> >>
> >>
> >>
> >>         On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com
> >>         <mailto:sriram.ec at gmail.com>> wrote:
> >>
> >>             Hi,
> >>
> >>             I went ahead and made some changes in file system(Like I
> >>             brought in /etc/init.d/corosync and /etc/init.d/pacemaker,
> >>             /etc/sysconfig ), After that I was able to run  "pcs
> >>             cluster start".
> >>             But it failed with the following error
> >>              # pcs cluster start
> >>             Starting Cluster...
> >>             Starting Pacemaker Cluster Manager[FAILED]
> >>             Error: unable to start pacemaker
> >>
> >>             And in the /var/log/pacemaker.log, I saw these errors
> >>             pacemakerd:     info: mcp_read_config:  cmap connection
> >>             setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
> >>             Apr 29 08:53:47 [15863] node_cu pacemakerd:     info:
> >>             mcp_read_config:  cmap connection setup failed:
> >>             CS_ERR_TRY_AGAIN.  Retrying in 5s
> >>             Apr 29 08:53:52 [15863] node_cu pacemakerd:  warning:
> >>             mcp_read_config:  Could not connect to Cluster
> >>             Configuration Database API, error 6
> >>             Apr 29 08:53:52 [15863] node_cu pacemakerd:   notice:
> >>             main:     Could not obtain corosync config data, exiting
> >>             Apr 29 08:53:52 [15863] node_cu pacemakerd:     info:
> >>             crm_xml_cleanup:  Cleaning up memory from libxml2
> >>
> >>
> >>             And in the /var/log/Debuglog, I saw these errors coming
> >>             from corosync
> >>             20160429 085347.487050 <tel:085347.487050> airv_cu
> >>             daemon.warn corosync[12857]:   [QB    ] Denied connection,
> >>             is not ready (12857-15863-14)
> >>             20160429 085347.487067 <tel:085347.487067> airv_cu
> >>             daemon.info <http://daemon.info> corosync[12857]:   [QB
> >>             ] Denied connection, is not ready (12857-15863-14)
> >>
> >>
> >>             I browsed the code of libqb to find that it is failing in
> >>
> >>             https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
> >>
> >>             Line 600 :
> >>             handle_new_connection function
> >>
> >>             Line 637:
> >>             if (auth_result == 0 &&
> >>             c->service->serv_fns.connection_accept) {
> >>                     res = c->service->serv_fns.connection_accept(c,
> >>                                              c->euid, c->egid);
> >>                 }
> >>                 if (res != 0) {
> >>                     goto send_response;
> >>                 }
> >>
> >>             Any hints on this issue would be really helpful for me to
> >>             go ahead.
> >>             Please let me know if any logs are required,
> >>
> >>             Regards,
> >>             Sriram
> >>
> >>             On Thu, Apr 28, 2016 at 2:42 PM, Sriram
> >>             <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>> wrote:
> >>
> >>                 Thanks Ken and Emmanuel.
> >>                 Its a big endian machine. I will try with running "pcs
> >>                 cluster setup" and "pcs cluster start"
> >>                 Inside cluster.py, "service pacemaker start" and
> >>                 "service corosync start" are executed to bring up
> >>                 pacemaker and corosync.
> >>                 Those service scripts and the infrastructure needed to
> >>                 bring up the processes in the above said manner
> >>                 doesn't exist in my board.
> >>                 As it is a embedded board with the limited memory,
> >>                 full fledged linux is not installed.
> >>                 Just curious to know, what could be reason the
> >>                 pacemaker throws that error.
> >>
> >>                 /"cmap connection setup failed: CS_ERR_TRY_AGAIN.
> >>                 Retrying in 1s"
> >>
> >>                 /
> >>                 Thanks for response.
> >>
> >>                 Regards,
> >>                 Sriram.
> >>
> >>                 On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot
> >>                 <kgaillot at redhat.com <mailto:kgaillot at redhat.com>> wrote:
> >>
> >>                     On 04/27/2016 11:25 AM, emmanuel segura wrote:
> >>                     > you need to use pcs to do everything, pcs
> >>                     cluster setup and pcs
> >>                     > cluster start, try to use the redhat docs for
> >>                     more information.
> >>
> >>                     Agreed -- pcs cluster setup will create a proper
> >>                     corosync.conf for you.
> >>                     Your corosync.conf below uses corosync 1 syntax,
> >>                     and there were
> >>                     significant changes in corosync 2. In particular,
> >>                     you don't need the
> >>                     file created in step 4, because pacemaker is no
> >>                     longer launched via a
> >>                     corosync plugin.
> >>
> >>                     > 2016-04-27 17:28 GMT+02:00 Sriram
> >>                     <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>>:
> >>                     >> Dear All,
> >>                     >>
> >>                     >> I m trying to use pacemaker and corosync for
> >>                     the clustering requirement that
> >>                     >> came up recently.
> >>                     >> We have cross compiled corosync, pacemaker and
> >>                     pcs(python) for ppc
> >>                     >> environment (Target board where pacemaker and
> >>                     corosync are supposed to run)
> >>                     >> I m having trouble bringing up pacemaker in
> >>                     that environment, though I could
> >>                     >> successfully bring up corosync.
> >>                     >> Any help is welcome.
> >>                     >>
> >>                     >> I m using these versions of pacemaker and corosync
> >>                     >> [root at node_cu pacemaker]# corosync -v
> >>                     >> Corosync Cluster Engine, version '2.3.5'
> >>                     >> Copyright (c) 2006-2009 Red Hat, Inc.
> >>                     >> [root at node_cu pacemaker]# pacemakerd -$
> >>                     >> Pacemaker 1.1.14
> >>                     >> Written by Andrew Beekhof
> >>                     >>
> >>                     >> For running corosync, I did the following.
> >>                     >> 1. Created the following directories,
> >>                     >>     /var/lib/pacemaker
> >>                     >>     /var/lib/corosync
> >>                     >>     /var/lib/pacemaker
> >>                     >>     /var/lib/pacemaker/cores
> >>                     >>     /var/lib/pacemaker/pengine
> >>                     >>     /var/lib/pacemaker/blackbox
> >>                     >>     /var/lib/pacemaker/cib
> >>                     >>
> >>                     >>
> >>                     >> 2. Created a file called corosync.conf under
> >>                     /etc/corosync folder with the
> >>                     >> following contents
> >>                     >>
> >>                     >> totem {
> >>                     >>
> >>                     >>         version: 2
> >>                     >>         token:          5000
> >>                     >>         token_retransmits_before_loss_const: 20
> >>                     >>         join:           1000
> >>                     >>         consensus:      7500
> >>                     >>         vsftype:        none
> >>                     >>         max_messages:   20
> >>                     >>         secauth:        off
> >>                     >>         cluster_name:   mycluster
> >>                     >>         transport:      udpu
> >>                     >>         threads:        0
> >>                     >>         clear_node_high_bit: yes
> >>                     >>
> >>                     >>         interface {
> >>                     >>                 ringnumber: 0
> >>                     >>                 # The following three values
> >>                     need to be set based on your
> >>                     >> environment
> >>                     >>                 bindnetaddr: 10.x.x.x
> >>                     >>                 mcastaddr: 226.94.1.1
> >>                     >>                 mcastport: 5405
> >>                     >>         }
> >>                     >>  }
> >>                     >>
> >>                     >>  logging {
> >>                     >>         fileline: off
> >>                     >>         to_syslog: yes
> >>                     >>         to_stderr: no
> >>                     >>         to_syslog: yes
> >>                     >>         logfile: /var/log/corosync.log
> >>                     >>         syslog_facility: daemon
> >>                     >>         debug: on
> >>                     >>         timestamp: on
> >>                     >>  }
> >>                     >>
> >>                     >>  amf {
> >>                     >>         mode: disabled
> >>                     >>  }
> >>                     >>
> >>                     >>  quorum {
> >>                     >>         provider: corosync_votequorum
> >>                     >>  }
> >>                     >>
> >>                     >> nodelist {
> >>                     >>   node {
> >>                     >>         ring0_addr: node_cu
> >>                     >>         nodeid: 1
> >>                     >>        }
> >>                     >> }
> >>                     >>
> >>                     >> 3.  Created authkey under /etc/corosync
> >>                     >>
> >>                     >> 4.  Created a file called pcmk under
> >>                     /etc/corosync/service.d and contents as
> >>                     >> below,
> >>                     >>       cat pcmk
> >>                     >>       service {
> >>                     >>          # Load the Pacemaker Cluster Resource
> >>                     Manager
> >>                     >>          name: pacemaker
> >>                     >>          ver:  1
> >>                     >>       }
> >>                     >>
> >>                     >> 5. Added the node name "node_cu" in /etc/hosts
> >>                     with 10.X.X.X ip
> >>                     >>
> >>                     >> 6. ./corosync -f -p & --> this step started
> >>                     corosync
> >>                     >>
> >>                     >> [root at node_cu pacemaker]# netstat -alpn | grep
> >>                     -i coros
> >>                     >> udp        0      0 10.X.X.X:61841     0.0.0.0:*
> >>                     >> 9133/corosync
> >>                     >> udp        0      0 10.X.X.X:5405      0.0.0.0:*
> >>                     >> 9133/corosync
> >>                     >> unix  2      [ ACC ]     STREAM     LISTENING
> >>                        148888 9133/corosync
> >>                     >> @quorum
> >>                     >> unix  2      [ ACC ]     STREAM     LISTENING
> >>                        148884 9133/corosync
> >>                     >> @cmap
> >>                     >> unix  2      [ ACC ]     STREAM     LISTENING
> >>                        148887 9133/corosync
> >>                     >> @votequorum
> >>                     >> unix  2      [ ACC ]     STREAM     LISTENING
> >>                        148885 9133/corosync
> >>                     >> @cfg
> >>                     >> unix  2      [ ACC ]     STREAM     LISTENING
> >>                        148886 9133/corosync
> >>                     >> @cpg
> >>                     >> unix  2      [ ]         DGRAM
> >>                       148840 9133/corosync
> >>                     >>
> >>                     >> 7. ./pacemakerd -f & gives the following error
> >>                     and exits.
> >>                     >> [root at node_cu pacemaker]# pacemakerd -f
> >>                     >> cmap connection setup failed:
> >>                     CS_ERR_TRY_AGAIN.  Retrying in 1s
> >>                     >> cmap connection setup failed:
> >>                     CS_ERR_TRY_AGAIN.  Retrying in 2s
> >>                     >> cmap connection setup failed:
> >>                     CS_ERR_TRY_AGAIN.  Retrying in 3s
> >>                     >> cmap connection setup failed:
> >>                     CS_ERR_TRY_AGAIN.  Retrying in 4s
> >>                     >> cmap connection setup failed:
> >>                     CS_ERR_TRY_AGAIN.  Retrying in 5s
> >>                     >> Could not connect to Cluster Configuration
> >>                     Database API, error 6
> >>                     >>
> >>                     >> Can you please point me, what is missing in
> >>                     these steps ?
> >>                     >>
> >>                     >> Before trying these steps, I tried running "pcs
> >>                     cluster start", but that
> >>                     >> command fails with "service" script not found.
> >>                     As the root filesystem
> >>                     >> doesn't contain either /etc/init.d/ or
> >>                     /sbin/service
> >>                     >>
> >>                     >> So, the plan is to bring up corosync and
> >>                     pacemaker manually, later do the
> >>                     >> cluster configuration using "pcs" commands.
> >>                     >>
> >>                     >> Regards,
> >>                     >> Sriram
> >>                     >>
> >>                     >> _______________________________________________
> >>                     >> Users mailing list: Users at clusterlabs.org
> >>                     <mailto:Users at clusterlabs.org>
> >>                     >> http://clusterlabs.org/mailman/listinfo/users
> >>                     >>
> >>                     >> Project Home: http://www.clusterlabs.org
> >>                     >> Getting started:
> >>                     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>                     >> Bugs: http://bugs.clusterlabs.org
> >>                     >>
> >>                     >
> >>                     >
> >>                     >
> >>
> >>
> >>                     _______________________________________________
> >>                     Users mailing list: Users at clusterlabs.org
> >>                     <mailto:Users at clusterlabs.org>
> >>                     http://clusterlabs.org/mailman/listinfo/users
> >>
> >>                     Project Home: http://www.clusterlabs.org
> >>                     Getting started:
> >>                     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>                     Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >>
> >>
> >>
> >>         _______________________________________________
> >>         Users mailing list: Users at clusterlabs.org
> >>         <mailto:Users at clusterlabs.org>
> >>         http://clusterlabs.org/mailman/listinfo/users
> >>
> >>         Project Home: http://www.clusterlabs.org
> >>         Getting started:
> >>         http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>         Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >>
> >>
> >>
> >>_______________________________________________
> >>Users mailing list: Users at clusterlabs.org
> >>http://clusterlabs.org/mailman/listinfo/users
> >>
> >>Project Home: http://www.clusterlabs.org
> >>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>Bugs: http://bugs.clusterlabs.org
> >
> >
> >_______________________________________________
> >Users mailing list: Users at clusterlabs.org
> >http://clusterlabs.org/mailman/listinfo/users
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://bugs.clusterlabs.org
> >
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Users mailing list