[ClusterLabs] [ClusterLab] : Corosync not initializing successfully

Fri Apr 29 12:41:43 UTC 2016

Corrected the subject.

We went ahead and captured corosync debug logs for our ppc board.
After log analysis and comparison with the sucessful logs( from x86
machine) ,
we didnt find * "[ MAIN  ] Completed service synchronization, ready to
provide service.*" in ppc logs.
So, looks like corosync is not in a position to accept connection from
Pacemaker.
Even I tried with the new corosync.conf with no success.

Any hints on this issue would be really helpful.

Attaching ppc_notworking.log, x86_working.log, corosync.conf.

Regards,
Sriram

On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com> wrote:

> Hi,
>
> I went ahead and made some changes in file system(Like I brought in
> /etc/init.d/corosync and /etc/init.d/pacemaker, /etc/sysconfig ), After
> that I was able to run  "pcs cluster start".
> But it failed with the following error
>  # pcs cluster start
> Starting Cluster...
> Starting Pacemaker Cluster Manager[FAILED]
> Error: unable to start pacemaker
>
> And in the /var/log/pacemaker.log, I saw these errors
> pacemakerd:     info: mcp_read_config:  cmap connection setup failed:
> CS_ERR_TRY_AGAIN.  Retrying in 4s
> Apr 29 08:53:47 [15863] node_cu pacemakerd:     info: mcp_read_config:
> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
> Apr 29 08:53:52 [15863] node_cu pacemakerd:  warning: mcp_read_config:
> Could not connect to Cluster Configuration Database API, error 6
> Apr 29 08:53:52 [15863] node_cu pacemakerd:   notice: main:     Could not
> obtain corosync config data, exiting
> Apr 29 08:53:52 [15863] node_cu pacemakerd:     info: crm_xml_cleanup:
> Cleaning up memory from libxml2
>
>
> And in the /var/log/Debuglog, I saw these errors coming from corosync
> 20160429 085347.487050 airv_cu daemon.warn corosync[12857]:   [QB    ]
> Denied connection, is not ready (12857-15863-14)
> 20160429 085347.487067 airv_cu daemon.info corosync[12857]:   [QB    ]
> Denied connection, is not ready (12857-15863-14)
>
>
> I browsed the code of libqb to find that it is failing in
>
> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>
> Line 600 :
> handle_new_connection function
>
> Line 637:
> if (auth_result == 0 && c->service->serv_fns.connection_accept) {
>         res = c->service->serv_fns.connection_accept(c,
>                                  c->euid, c->egid);
>     }
>     if (res != 0) {
>         goto send_response;
>     }
>
> Any hints on this issue would be really helpful for me to go ahead.
> Please let me know if any logs are required,
>
> Regards,
> Sriram
>
> On Thu, Apr 28, 2016 at 2:42 PM, Sriram <sriram.ec at gmail.com> wrote:
>
>> Thanks Ken and Emmanuel.
>> Its a big endian machine. I will try with running "pcs cluster setup" and
>> "pcs cluster start"
>> Inside cluster.py, "service pacemaker start" and "service corosync start"
>> are executed to bring up pacemaker and corosync.
>> Those service scripts and the infrastructure needed to bring up the
>> processes in the above said manner doesn't exist in my board.
>> As it is a embedded board with the limited memory, full fledged linux is
>> not installed.
>> Just curious to know, what could be reason the pacemaker throws that
>> error.
>>
>>
>>
>> *"cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s"*
>> Thanks for response.
>>
>> Regards,
>> Sriram.
>>
>> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>>> On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>> > you need to use pcs to do everything, pcs cluster setup and pcs
>>> > cluster start, try to use the redhat docs for more information.
>>>
>>> Agreed -- pcs cluster setup will create a proper corosync.conf for you.
>>> Your corosync.conf below uses corosync 1 syntax, and there were
>>> significant changes in corosync 2. In particular, you don't need the
>>> file created in step 4, because pacemaker is no longer launched via a
>>> corosync plugin.
>>>
>>> > 2016-04-27 17:28 GMT+02:00 Sriram <sriram.ec at gmail.com>:
>>> >> Dear All,
>>> >>
>>> >> I m trying to use pacemaker and corosync for the clustering
>>> requirement that
>>> >> came up recently.
>>> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc
>>> >> environment (Target board where pacemaker and corosync are supposed
>>> to run)
>>> >> I m having trouble bringing up pacemaker in that environment, though
>>> I could
>>> >> successfully bring up corosync.
>>> >> Any help is welcome.
>>> >>
>>> >> I m using these versions of pacemaker and corosync
>>> >> [root at node_cu pacemaker]# corosync -v
>>> >> Corosync Cluster Engine, version '2.3.5'
>>> >> Copyright (c) 2006-2009 Red Hat, Inc.
>>> >> [root at node_cu pacemaker]# pacemakerd -$
>>> >> Pacemaker 1.1.14
>>> >> Written by Andrew Beekhof
>>> >>
>>> >> For running corosync, I did the following.
>>> >> 1. Created the following directories,
>>> >>     /var/lib/pacemaker
>>> >>     /var/lib/corosync
>>> >>     /var/lib/pacemaker
>>> >>     /var/lib/pacemaker/cores
>>> >>     /var/lib/pacemaker/pengine
>>> >>     /var/lib/pacemaker/blackbox
>>> >>     /var/lib/pacemaker/cib
>>> >>
>>> >>
>>> >> 2. Created a file called corosync.conf under /etc/corosync folder
>>> with the
>>> >> following contents
>>> >>
>>> >> totem {
>>> >>
>>> >>         version: 2
>>> >>         token:          5000
>>> >>         token_retransmits_before_loss_const: 20
>>> >>         join:           1000
>>> >>         consensus:      7500
>>> >>         vsftype:        none
>>> >>         max_messages:   20
>>> >>         secauth:        off
>>> >>         cluster_name:   mycluster
>>> >>         transport:      udpu
>>> >>         threads:        0
>>> >>         clear_node_high_bit: yes
>>> >>
>>> >>         interface {
>>> >>                 ringnumber: 0
>>> >>                 # The following three values need to be set based on
>>> your
>>> >> environment
>>> >>                 bindnetaddr: 10.x.x.x
>>> >>                 mcastaddr: 226.94.1.1
>>> >>                 mcastport: 5405
>>> >>         }
>>> >>  }
>>> >>
>>> >>  logging {
>>> >>         fileline: off
>>> >>         to_syslog: yes
>>> >>         to_stderr: no
>>> >>         to_syslog: yes
>>> >>         logfile: /var/log/corosync.log
>>> >>         syslog_facility: daemon
>>> >>         debug: on
>>> >>         timestamp: on
>>> >>  }
>>> >>
>>> >>  amf {
>>> >>         mode: disabled
>>> >>  }
>>> >>
>>> >>  quorum {
>>> >>         provider: corosync_votequorum
>>> >>  }
>>> >>
>>> >> nodelist {
>>> >>   node {
>>> >>         ring0_addr: node_cu
>>> >>         nodeid: 1
>>> >>        }
>>> >> }
>>> >>
>>> >> 3.  Created authkey under /etc/corosync
>>> >>
>>> >> 4.  Created a file called pcmk under /etc/corosync/service.d and
>>> contents as
>>> >> below,
>>> >>       cat pcmk
>>> >>       service {
>>> >>          # Load the Pacemaker Cluster Resource Manager
>>> >>          name: pacemaker
>>> >>          ver:  1
>>> >>       }
>>> >>
>>> >> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip
>>> >>
>>> >> 6. ./corosync -f -p & --> this step started corosync
>>> >>
>>> >> [root at node_cu pacemaker]# netstat -alpn | grep -i coros
>>> >> udp        0      0 10.X.X.X:61841     0.0.0.0:*
>>> >> 9133/corosync
>>> >> udp        0      0 10.X.X.X:5405      0.0.0.0:*
>>> >> 9133/corosync
>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148888 9133/corosync
>>> >> @quorum
>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148884 9133/corosync
>>> >> @cmap
>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148887 9133/corosync
>>> >> @votequorum
>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148885 9133/corosync
>>> >> @cfg
>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148886 9133/corosync
>>> >> @cpg
>>> >> unix  2      [ ]         DGRAM                    148840 9133/corosync
>>> >>
>>> >> 7. ./pacemakerd -f & gives the following error and exits.
>>> >> [root at node_cu pacemaker]# pacemakerd -f
>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s
>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 2s
>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 3s
>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
>>> >> Could not connect to Cluster Configuration Database API, error 6
>>> >>
>>> >> Can you please point me, what is missing in these steps ?
>>> >>
>>> >> Before trying these steps, I tried running "pcs cluster start", but
>>> that
>>> >> command fails with "service" script not found. As the root filesystem
>>> >> doesn't contain either /etc/init.d/ or /sbin/service
>>> >>
>>> >> So, the plan is to bring up corosync and pacemaker manually, later do
>>> the
>>> >> cluster configuration using "pcs" commands.
>>> >>
>>> >> Regards,
>>> >> Sriram
>>> >>
>>> >> _______________________________________________
>>> >> Users mailing list: Users at clusterlabs.org
>>> >> http://clusterlabs.org/mailman/listinfo/users
>>> >>
>>> >> Project Home: http://www.clusterlabs.org
>>> >> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> Bugs: http://bugs.clusterlabs.org
>>> >>
>>> >
>>> >
>>> >
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160429/cac79bc6/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ppc_notworking.log
Type: application/octet-stream
Size: 23194 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160429/cac79bc6/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x86_working.log
Type: application/octet-stream
Size: 6030 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160429/cac79bc6/attachment-0010.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.conf
Type: application/octet-stream
Size: 2760 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160429/cac79bc6/attachment-0011.obj>