[ClusterLabs] [ClusterLab] : Corosync not initializing successfully

Jan Friesse jfriesse at redhat.com
Mon May 2 02:54:09 EDT 2016


> As your hardware is probably capable of running ppcle and if you have an
> environment
> at hand without too much effort it might pay off to try that.
> There are of course distributions out there support corosync on
> big-endian architectures
> but I don't know if there is an automatized regression for corosync on
> big-endian that
> would catch big-endian-issues right away with something as current as
> your 2.3.5.

No we are not testing big-endian.

So totally agree with Klaus. Give a try to ppcle. Also make sure all 
nodes are little-endian. Corosync should work in mixed BE/LE environment 
but because it's not tested, it may not work (and it's a bug, so if 
ppcle works I will try to fix BE).

Regards,
   Honza

>
> Regards,
> Klaus
>
> On 05/02/2016 06:44 AM, Nikhil Utane wrote:
>> Re-sending as I don't see my post on the thread.
>>
>> On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane
>> <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>> wrote:
>>
>>      Hi,
>>
>>      Looking for some guidance here as we are completely blocked
>>      otherwise :(.
>>
>>      -Regards
>>      Nikhil
>>
>>      On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com
>>      <mailto:sriram.ec at gmail.com>> wrote:
>>
>>          Corrected the subject.
>>
>>          We went ahead and captured corosync debug logs for our ppc board.
>>          After log analysis and comparison with the sucessful logs(
>>          from x86 machine) ,
>>          we didnt find *"[ MAIN  ] Completed service synchronization,
>>          ready to provide service.*" in ppc logs.
>>          So, looks like corosync is not in a position to accept
>>          connection from Pacemaker.
>>          Even I tried with the new corosync.conf with no success.
>>
>>          Any hints on this issue would be really helpful.
>>
>>          Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>>
>>          Regards,
>>          Sriram
>>
>>
>>
>>          On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com
>>          <mailto:sriram.ec at gmail.com>> wrote:
>>
>>              Hi,
>>
>>              I went ahead and made some changes in file system(Like I
>>              brought in /etc/init.d/corosync and /etc/init.d/pacemaker,
>>              /etc/sysconfig ), After that I was able to run  "pcs
>>              cluster start".
>>              But it failed with the following error
>>               # pcs cluster start
>>              Starting Cluster...
>>              Starting Pacemaker Cluster Manager[FAILED]
>>              Error: unable to start pacemaker
>>
>>              And in the /var/log/pacemaker.log, I saw these errors
>>              pacemakerd:     info: mcp_read_config:  cmap connection
>>              setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
>>              Apr 29 08:53:47 [15863] node_cu pacemakerd:     info:
>>              mcp_read_config:  cmap connection setup failed:
>>              CS_ERR_TRY_AGAIN.  Retrying in 5s
>>              Apr 29 08:53:52 [15863] node_cu pacemakerd:  warning:
>>              mcp_read_config:  Could not connect to Cluster
>>              Configuration Database API, error 6
>>              Apr 29 08:53:52 [15863] node_cu pacemakerd:   notice:
>>              main:     Could not obtain corosync config data, exiting
>>              Apr 29 08:53:52 [15863] node_cu pacemakerd:     info:
>>              crm_xml_cleanup:  Cleaning up memory from libxml2
>>
>>
>>              And in the /var/log/Debuglog, I saw these errors coming
>>              from corosync
>>              20160429 085347.487050 <tel:085347.487050> airv_cu
>>              daemon.warn corosync[12857]:   [QB    ] Denied connection,
>>              is not ready (12857-15863-14)
>>              20160429 085347.487067 <tel:085347.487067> airv_cu
>>              daemon.info <http://daemon.info> corosync[12857]:   [QB
>>              ] Denied connection, is not ready (12857-15863-14)
>>
>>
>>              I browsed the code of libqb to find that it is failing in
>>
>>              https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>
>>              Line 600 :
>>              handle_new_connection function
>>
>>              Line 637:
>>              if (auth_result == 0 &&
>>              c->service->serv_fns.connection_accept) {
>>                      res = c->service->serv_fns.connection_accept(c,
>>                                               c->euid, c->egid);
>>                  }
>>                  if (res != 0) {
>>                      goto send_response;
>>                  }
>>
>>              Any hints on this issue would be really helpful for me to
>>              go ahead.
>>              Please let me know if any logs are required,
>>
>>              Regards,
>>              Sriram
>>
>>              On Thu, Apr 28, 2016 at 2:42 PM, Sriram
>>              <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>> wrote:
>>
>>                  Thanks Ken and Emmanuel.
>>                  Its a big endian machine. I will try with running "pcs
>>                  cluster setup" and "pcs cluster start"
>>                  Inside cluster.py, "service pacemaker start" and
>>                  "service corosync start" are executed to bring up
>>                  pacemaker and corosync.
>>                  Those service scripts and the infrastructure needed to
>>                  bring up the processes in the above said manner
>>                  doesn't exist in my board.
>>                  As it is a embedded board with the limited memory,
>>                  full fledged linux is not installed.
>>                  Just curious to know, what could be reason the
>>                  pacemaker throws that error.
>>
>>                  /"cmap connection setup failed: CS_ERR_TRY_AGAIN.
>>                  Retrying in 1s"
>>
>>                  /
>>                  Thanks for response.
>>
>>                  Regards,
>>                  Sriram.
>>
>>                  On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot
>>                  <kgaillot at redhat.com <mailto:kgaillot at redhat.com>> wrote:
>>
>>                      On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>                      > you need to use pcs to do everything, pcs
>>                      cluster setup and pcs
>>                      > cluster start, try to use the redhat docs for
>>                      more information.
>>
>>                      Agreed -- pcs cluster setup will create a proper
>>                      corosync.conf for you.
>>                      Your corosync.conf below uses corosync 1 syntax,
>>                      and there were
>>                      significant changes in corosync 2. In particular,
>>                      you don't need the
>>                      file created in step 4, because pacemaker is no
>>                      longer launched via a
>>                      corosync plugin.
>>
>>                      > 2016-04-27 17:28 GMT+02:00 Sriram
>>                      <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>>:
>>                      >> Dear All,
>>                      >>
>>                      >> I m trying to use pacemaker and corosync for
>>                      the clustering requirement that
>>                      >> came up recently.
>>                      >> We have cross compiled corosync, pacemaker and
>>                      pcs(python) for ppc
>>                      >> environment (Target board where pacemaker and
>>                      corosync are supposed to run)
>>                      >> I m having trouble bringing up pacemaker in
>>                      that environment, though I could
>>                      >> successfully bring up corosync.
>>                      >> Any help is welcome.
>>                      >>
>>                      >> I m using these versions of pacemaker and corosync
>>                      >> [root at node_cu pacemaker]# corosync -v
>>                      >> Corosync Cluster Engine, version '2.3.5'
>>                      >> Copyright (c) 2006-2009 Red Hat, Inc.
>>                      >> [root at node_cu pacemaker]# pacemakerd -$
>>                      >> Pacemaker 1.1.14
>>                      >> Written by Andrew Beekhof
>>                      >>
>>                      >> For running corosync, I did the following.
>>                      >> 1. Created the following directories,
>>                      >>     /var/lib/pacemaker
>>                      >>     /var/lib/corosync
>>                      >>     /var/lib/pacemaker
>>                      >>     /var/lib/pacemaker/cores
>>                      >>     /var/lib/pacemaker/pengine
>>                      >>     /var/lib/pacemaker/blackbox
>>                      >>     /var/lib/pacemaker/cib
>>                      >>
>>                      >>
>>                      >> 2. Created a file called corosync.conf under
>>                      /etc/corosync folder with the
>>                      >> following contents
>>                      >>
>>                      >> totem {
>>                      >>
>>                      >>         version: 2
>>                      >>         token:          5000
>>                      >>         token_retransmits_before_loss_const: 20
>>                      >>         join:           1000
>>                      >>         consensus:      7500
>>                      >>         vsftype:        none
>>                      >>         max_messages:   20
>>                      >>         secauth:        off
>>                      >>         cluster_name:   mycluster
>>                      >>         transport:      udpu
>>                      >>         threads:        0
>>                      >>         clear_node_high_bit: yes
>>                      >>
>>                      >>         interface {
>>                      >>                 ringnumber: 0
>>                      >>                 # The following three values
>>                      need to be set based on your
>>                      >> environment
>>                      >>                 bindnetaddr: 10.x.x.x
>>                      >>                 mcastaddr: 226.94.1.1
>>                      >>                 mcastport: 5405
>>                      >>         }
>>                      >>  }
>>                      >>
>>                      >>  logging {
>>                      >>         fileline: off
>>                      >>         to_syslog: yes
>>                      >>         to_stderr: no
>>                      >>         to_syslog: yes
>>                      >>         logfile: /var/log/corosync.log
>>                      >>         syslog_facility: daemon
>>                      >>         debug: on
>>                      >>         timestamp: on
>>                      >>  }
>>                      >>
>>                      >>  amf {
>>                      >>         mode: disabled
>>                      >>  }
>>                      >>
>>                      >>  quorum {
>>                      >>         provider: corosync_votequorum
>>                      >>  }
>>                      >>
>>                      >> nodelist {
>>                      >>   node {
>>                      >>         ring0_addr: node_cu
>>                      >>         nodeid: 1
>>                      >>        }
>>                      >> }
>>                      >>
>>                      >> 3.  Created authkey under /etc/corosync
>>                      >>
>>                      >> 4.  Created a file called pcmk under
>>                      /etc/corosync/service.d and contents as
>>                      >> below,
>>                      >>       cat pcmk
>>                      >>       service {
>>                      >>          # Load the Pacemaker Cluster Resource
>>                      Manager
>>                      >>          name: pacemaker
>>                      >>          ver:  1
>>                      >>       }
>>                      >>
>>                      >> 5. Added the node name "node_cu" in /etc/hosts
>>                      with 10.X.X.X ip
>>                      >>
>>                      >> 6. ./corosync -f -p & --> this step started
>>                      corosync
>>                      >>
>>                      >> [root at node_cu pacemaker]# netstat -alpn | grep
>>                      -i coros
>>                      >> udp        0      0 10.X.X.X:61841     0.0.0.0:*
>>                      >> 9133/corosync
>>                      >> udp        0      0 10.X.X.X:5405      0.0.0.0:*
>>                      >> 9133/corosync
>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>                         148888 9133/corosync
>>                      >> @quorum
>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>                         148884 9133/corosync
>>                      >> @cmap
>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>                         148887 9133/corosync
>>                      >> @votequorum
>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>                         148885 9133/corosync
>>                      >> @cfg
>>                      >> unix  2      [ ACC ]     STREAM     LISTENING
>>                         148886 9133/corosync
>>                      >> @cpg
>>                      >> unix  2      [ ]         DGRAM
>>                        148840 9133/corosync
>>                      >>
>>                      >> 7. ./pacemakerd -f & gives the following error
>>                      and exits.
>>                      >> [root at node_cu pacemaker]# pacemakerd -f
>>                      >> cmap connection setup failed:
>>                      CS_ERR_TRY_AGAIN.  Retrying in 1s
>>                      >> cmap connection setup failed:
>>                      CS_ERR_TRY_AGAIN.  Retrying in 2s
>>                      >> cmap connection setup failed:
>>                      CS_ERR_TRY_AGAIN.  Retrying in 3s
>>                      >> cmap connection setup failed:
>>                      CS_ERR_TRY_AGAIN.  Retrying in 4s
>>                      >> cmap connection setup failed:
>>                      CS_ERR_TRY_AGAIN.  Retrying in 5s
>>                      >> Could not connect to Cluster Configuration
>>                      Database API, error 6
>>                      >>
>>                      >> Can you please point me, what is missing in
>>                      these steps ?
>>                      >>
>>                      >> Before trying these steps, I tried running "pcs
>>                      cluster start", but that
>>                      >> command fails with "service" script not found.
>>                      As the root filesystem
>>                      >> doesn't contain either /etc/init.d/ or
>>                      /sbin/service
>>                      >>
>>                      >> So, the plan is to bring up corosync and
>>                      pacemaker manually, later do the
>>                      >> cluster configuration using "pcs" commands.
>>                      >>
>>                      >> Regards,
>>                      >> Sriram
>>                      >>
>>                      >> _______________________________________________
>>                      >> Users mailing list: Users at clusterlabs.org
>>                      <mailto:Users at clusterlabs.org>
>>                      >> http://clusterlabs.org/mailman/listinfo/users
>>                      >>
>>                      >> Project Home: http://www.clusterlabs.org
>>                      >> Getting started:
>>                      http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>                      >> Bugs: http://bugs.clusterlabs.org
>>                      >>
>>                      >
>>                      >
>>                      >
>>
>>
>>                      _______________________________________________
>>                      Users mailing list: Users at clusterlabs.org
>>                      <mailto:Users at clusterlabs.org>
>>                      http://clusterlabs.org/mailman/listinfo/users
>>
>>                      Project Home: http://www.clusterlabs.org
>>                      Getting started:
>>                      http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>                      Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>>
>>          _______________________________________________
>>          Users mailing list: Users at clusterlabs.org
>>          <mailto:Users at clusterlabs.org>
>>          http://clusterlabs.org/mailman/listinfo/users
>>
>>          Project Home: http://www.clusterlabs.org
>>          Getting started:
>>          http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>          Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>





More information about the Users mailing list