[ClusterLabs] [ClusterLab] : Corosync not initializing successfully

Nikhil Utane nikhil.subscribed at gmail.com
Mon May 2 09:30:25 CEST 2016


So what I understand what you are saying is, if the HW is bi-endian, then
enable LE on PPC. Is that right?
Need to check on that.

On Mon, May 2, 2016 at 12:49 PM, Nikhil Utane <nikhil.subscribed at gmail.com>
wrote:

> Sorry about my ignorance but could you pls elaborate what do you mean by
> "try to ppcle"?
>
> Our target platform is ppc so it is BE. We have to get it running only on
> that.
> How do we know this is LE/BE issue and nothing else?
>
> -Thanks
> Nikhil
>
>
> On Mon, May 2, 2016 at 12:24 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>
>> As your hardware is probably capable of running ppcle and if you have an
>>> environment
>>> at hand without too much effort it might pay off to try that.
>>> There are of course distributions out there support corosync on
>>> big-endian architectures
>>> but I don't know if there is an automatized regression for corosync on
>>> big-endian that
>>> would catch big-endian-issues right away with something as current as
>>> your 2.3.5.
>>>
>>
>> No we are not testing big-endian.
>>
>> So totally agree with Klaus. Give a try to ppcle. Also make sure all
>> nodes are little-endian. Corosync should work in mixed BE/LE environment
>> but because it's not tested, it may not work (and it's a bug, so if ppcle
>> works I will try to fix BE).
>>
>> Regards,
>>   Honza
>>
>>
>>
>>> Regards,
>>> Klaus
>>>
>>> On 05/02/2016 06:44 AM, Nikhil Utane wrote:
>>>
>>>> Re-sending as I don't see my post on the thread.
>>>>
>>>> On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane
>>>> <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>>
>>>> wrote:
>>>>
>>>>      Hi,
>>>>
>>>>      Looking for some guidance here as we are completely blocked
>>>>      otherwise :(.
>>>>
>>>>      -Regards
>>>>      Nikhil
>>>>
>>>>      On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com
>>>>      <mailto:sriram.ec at gmail.com>> wrote:
>>>>
>>>>          Corrected the subject.
>>>>
>>>>          We went ahead and captured corosync debug logs for our ppc
>>>> board.
>>>>          After log analysis and comparison with the sucessful logs(
>>>>          from x86 machine) ,
>>>>          we didnt find *"[ MAIN  ] Completed service synchronization,
>>>>          ready to provide service.*" in ppc logs.
>>>>          So, looks like corosync is not in a position to accept
>>>>          connection from Pacemaker.
>>>>          Even I tried with the new corosync.conf with no success.
>>>>
>>>>          Any hints on this issue would be really helpful.
>>>>
>>>>          Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>>>>
>>>>          Regards,
>>>>          Sriram
>>>>
>>>>
>>>>
>>>>          On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com
>>>>          <mailto:sriram.ec at gmail.com>> wrote:
>>>>
>>>>              Hi,
>>>>
>>>>              I went ahead and made some changes in file system(Like I
>>>>              brought in /etc/init.d/corosync and /etc/init.d/pacemaker,
>>>>              /etc/sysconfig ), After that I was able to run  "pcs
>>>>              cluster start".
>>>>              But it failed with the following error
>>>>               # pcs cluster start
>>>>              Starting Cluster...
>>>>              Starting Pacemaker Cluster Manager[FAILED]
>>>>              Error: unable to start pacemaker
>>>>
>>>>              And in the /var/log/pacemaker.log, I saw these errors
>>>>              pacemakerd:     info: mcp_read_config:  cmap connection
>>>>              setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
>>>>              Apr 29 08:53:47 [15863] node_cu pacemakerd:     info:
>>>>              mcp_read_config:  cmap connection setup failed:
>>>>              CS_ERR_TRY_AGAIN.  Retrying in 5s
>>>>              Apr 29 08:53:52 [15863] node_cu pacemakerd:  warning:
>>>>              mcp_read_config:  Could not connect to Cluster
>>>>              Configuration Database API, error 6
>>>>              Apr 29 08:53:52 [15863] node_cu pacemakerd:   notice:
>>>>              main:     Could not obtain corosync config data, exiting
>>>>              Apr 29 08:53:52 [15863] node_cu pacemakerd:     info:
>>>>              crm_xml_cleanup:  Cleaning up memory from libxml2
>>>>
>>>>
>>>>              And in the /var/log/Debuglog, I saw these errors coming
>>>>              from corosync
>>>>              20160429 085347.487050 <tel:085347.487050> airv_cu
>>>>              daemon.warn corosync[12857]:   [QB    ] Denied connection,
>>>>              is not ready (12857-15863-14)
>>>>              20160429 085347.487067 <tel:085347.487067> airv_cu
>>>>              daemon.info <http://daemon.info> corosync[12857]:   [QB
>>>>              ] Denied connection, is not ready (12857-15863-14)
>>>>
>>>>
>>>>              I browsed the code of libqb to find that it is failing in
>>>>
>>>>
>>>> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>>>
>>>>              Line 600 :
>>>>              handle_new_connection function
>>>>
>>>>              Line 637:
>>>>              if (auth_result == 0 &&
>>>>              c->service->serv_fns.connection_accept) {
>>>>                      res = c->service->serv_fns.connection_accept(c,
>>>>                                               c->euid, c->egid);
>>>>                  }
>>>>                  if (res != 0) {
>>>>                      goto send_response;
>>>>                  }
>>>>
>>>>              Any hints on this issue would be really helpful for me to
>>>>              go ahead.
>>>>              Please let me know if any logs are required,
>>>>
>>>>              Regards,
>>>>              Sriram
>>>>
>>>>              On Thu, Apr 28, 2016 at 2:42 PM, Sriram
>>>>              <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>> wrote:
>>>>
>>>>                  Thanks Ken and Emmanuel.
>>>>                  Its a big endian machine. I will try with running "pcs
>>>>                  cluster setup" and "pcs cluster start"
>>>>                  Inside cluster.py, "service pacemaker start" and
>>>>                  "service corosync start" are executed to bring up
>>>>                  pacemaker and corosync.
>>>>                  Those service scripts and the infrastructure needed to
>>>>                  bring up the processes in the above said manner
>>>>                  doesn't exist in my board.
>>>>                  As it is a embedded board with the limited memory,
>>>>                  full fledged linux is not installed.
>>>>                  Just curious to know, what could be reason the
>>>>                  pacemaker throws that error.
>>>>
>>>>                  /"cmap connection setup failed: CS_ERR_TRY_AGAIN.
>>>>                  Retrying in 1s"
>>>>
>>>>                  /
>>>>                  Thanks for response.
>>>>
>>>>                  Regards,
>>>>                  Sriram.
>>>>
>>>>                  On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot
>>>>                  <kgaillot at redhat.com <mailto:kgaillot at redhat.com>>
>>>> wrote:
>>>>
>>>>                      On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>>>                      > you need to use pcs to do everything, pcs
>>>>                      cluster setup and pcs
>>>>                      > cluster start, try to use the redhat docs for
>>>>                      more information.
>>>>
>>>>                      Agreed -- pcs cluster setup will create a proper
>>>>                      corosync.conf for you.
>>>>                      Your corosync.conf below uses corosync 1 syntax,
>>>>                      and there were
>>>>                      significant changes in corosync 2. In particular,
>>>>                      you don't need the
>>>>                      file created in step 4, because pacemaker is no
>>>>                      longer launched via a
>>>>                      corosync plugin.
>>>>
>>>>                      > 2016-04-27 17:28 GMT+02:00 Sriram
>>>>                      <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com
>>>> >>:
>>>>                      >> Dear All,
>>>>                      >>
>>>>                      >> I m trying to use pacemaker and corosync for
>>>>                      the clustering requirement that
>>>>                      >> came up recently.
>>>>                      >> We have cross compiled corosync, pacemaker and
>>>>                      pcs(python) for ppc
>>>>                      >> environment (Target board where pacemaker and
>>>>                      corosync are supposed to run)
>>>>                      >> I m having trouble bringing up pacemaker in
>>>>                      that environment, though I could
>>>>                      >> successfully bring up corosync.
>>>>                      >> Any help is welcome.
>>>>                      >>
>>>>                      >> I m using these versions of pacemaker and
>>>> corosync
>>>>                      >> [root at node_cu pacemaker]# corosync -v
>>>>                      >> Corosync Cluster Engine, version '2.3.5'
>>>>                      >> Copyright (c) 2006-2009 Red Hat, Inc.
>>>>                      >> [root at node_cu pacemaker]# pacemakerd -$
>>>>                      >> Pacemaker 1.1.14
>>>>                      >> Written by Andrew Beekhof
>>>>                      >>
>>>>                      >> For running corosync, I did the following.
>>>>                      >> 1. Created the following directories,
>>>>                      >>     /var/lib/pacemaker
>>>>                      >>     /var/lib/corosync
>>>>                      >>     /var/lib/pacemaker
>>>>                      >>     /var/lib/pacemaker/cores
>>>>                      >>     /var/lib/pacemaker/pengine
>>>>                      >>     /var/lib/pacemaker/blackbox
>>>>                      >>     /var/lib/pacemaker/cib
>>>>                      >>
>>>>                      >>
>>>>                      >> 2. Created a file called corosync.conf under
>>>>                      /etc/corosync folder with the
>>>>                      >> following contents
>>>>                      >>
>>>>                      >> totem {
>>>>                      >>
>>>>                      >>         version: 2
>>>>                      >>         token:          5000
>>>>                      >>         token_retransmits_before_loss_const: 20
>>>>                      >>         join:           1000
>>>>                      >>         consensus:      7500
>>>>                      >>         vsftype:        none
>>>>                      >>         max_messages:   20
>>>>                      >>         secauth:        off
>>>>                      >>         cluster_name:   mycluster
>>>>                      >>         transport:      udpu
>>>>                      >>         threads:        0
>>>>                      >>         clear_node_high_bit: yes
>>>>                      >>
>>>>                      >>         interface {
>>>>                      >>                 ringnumber: 0
>>>>                      >>                 # The following three values
>>>>                      need to be set based on your
>>>>                      >> environment
>>>>                      >>                 bindnetaddr: 10.x.x.x
>>>>                      >>                 mcastaddr: 226.94.1.1
>>>>                      >>                 mcastport: 5405
>>>>                      >>         }
>>>>                      >>  }
>>>>                      >>
>>>>                      >>  logging {
>>>>                      >>         fileline: off
>>>>                      >>         to_syslog: yes
>>>>                      >>         to_stderr: no
>>>>                      >>         to_syslog: yes
>>>>                      >>         logfile: /var/log/corosync.log
>>>>                      >>         syslog_facility: daemon
>>>>                      >>         debug: on
>>>>                      >>         timestamp: on
>>>>                      >>  }
>>>>                      >>
>>>>                      >>  amf {
>>>>                      >>         mode: disabled
>>>>                      >>  }
>>>>                      >>
>>>>                      >>  quorum {
>>>>                      >>         provider: corosync_votequorum
>>>>                      >>  }
>>>>                      >>
>>>>                      >> nodelist {
>>>>                      >>   node {
>>>>                      >>         ring0_addr: node_cu
>>>>                      >>         nodeid: 1
>>>>                      >>        }
>>>>                      >> }
>>>>                      >>
>>>>                      >> 3.  Created authkey under /etc/corosync
>>>>                      >>
>>>>                      >> 4.  Created a file called pcmk under
>>>>                      /etc/corosync/service.d and contents as
>>>>                      >> below,
>>>>                      >>       cat pcmk
>>>>                      >>       service {
>>>>                      >>          # Load the Pacemaker Cluster Resource
>>>>                      Manager
>>>>                      >>          name: pacemaker
>>>>                      >>          ver:  1
>>>>                      >>       }
>>>>                      >>
>>>>                      >> 5. Added the node name "node_cu" in /etc/hosts
>>>>                      with 10.X.X.X ip
>>>>                      >>
>>>>                      >> 6. ./corosync -f -p & --> this step started
>>>>                      corosync
>>>>                      >>
>>>>                      >> [root at node_cu pacemaker]# netstat -alpn | grep
>>>>                      -i coros
>>>>                      >> udp        0      0 10.X.X.X:61841     0.0.0.0:
>>>> *
>>>>                      >> 9133/corosync
>>>>                      >> udp        0      0 10.X.X.X:5405      0.0.0.0:
>>>> *
>>>>                      >> 9133/corosync
>>>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>>>                         148888 9133/corosync
>>>>                      >> @quorum
>>>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>>>                         148884 9133/corosync
>>>>                      >> @cmap
>>>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>>>                         148887 9133/corosync
>>>>                      >> @votequorum
>>>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>>>                         148885 9133/corosync
>>>>                      >> @cfg
>>>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>>>                         148886 9133/corosync
>>>>                      >> @cpg
>>>>                      >> unix  2      [ ]         DGRAM
>>>>                        148840 9133/corosync
>>>>                      >>
>>>>                      >> 7. ./pacemakerd -f & gives the following error
>>>>                      and exits.
>>>>                      >> [root at node_cu pacemaker]# pacemakerd -f
>>>>                      >> cmap connection setup failed:
>>>>                      CS_ERR_TRY_AGAIN.  Retrying in 1s
>>>>                      >> cmap connection setup failed:
>>>>                      CS_ERR_TRY_AGAIN.  Retrying in 2s
>>>>                      >> cmap connection setup failed:
>>>>                      CS_ERR_TRY_AGAIN.  Retrying in 3s
>>>>                      >> cmap connection setup failed:
>>>>                      CS_ERR_TRY_AGAIN.  Retrying in 4s
>>>>                      >> cmap connection setup failed:
>>>>                      CS_ERR_TRY_AGAIN.  Retrying in 5s
>>>>                      >> Could not connect to Cluster Configuration
>>>>                      Database API, error 6
>>>>                      >>
>>>>                      >> Can you please point me, what is missing in
>>>>                      these steps ?
>>>>                      >>
>>>>                      >> Before trying these steps, I tried running "pcs
>>>>                      cluster start", but that
>>>>                      >> command fails with "service" script not found.
>>>>                      As the root filesystem
>>>>                      >> doesn't contain either /etc/init.d/ or
>>>>                      /sbin/service
>>>>                      >>
>>>>                      >> So, the plan is to bring up corosync and
>>>>                      pacemaker manually, later do the
>>>>                      >> cluster configuration using "pcs" commands.
>>>>                      >>
>>>>                      >> Regards,
>>>>                      >> Sriram
>>>>                      >>
>>>>                      >> _______________________________________________
>>>>                      >> Users mailing list: Users at clusterlabs.org
>>>>                      <mailto:Users at clusterlabs.org>
>>>>                      >> http://clusterlabs.org/mailman/listinfo/users
>>>>                      >>
>>>>                      >> Project Home: http://www.clusterlabs.org
>>>>                      >> Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>                      >> Bugs: http://bugs.clusterlabs.org
>>>>                      >>
>>>>                      >
>>>>                      >
>>>>                      >
>>>>
>>>>
>>>>                      _______________________________________________
>>>>                      Users mailing list: Users at clusterlabs.org
>>>>                      <mailto:Users at clusterlabs.org>
>>>>                      http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>                      Project Home: http://www.clusterlabs.org
>>>>                      Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>                      Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>          _______________________________________________
>>>>          Users mailing list: Users at clusterlabs.org
>>>>          <mailto:Users at clusterlabs.org>
>>>>          http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>>          Project Home: http://www.clusterlabs.org
>>>>          Getting started:
>>>>          http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>          Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20160502/a39c6c30/attachment-0001.html>


More information about the Users mailing list