[ClusterLabs] [ClusterLab] : Corosync not initializing successfully

Sun May 1 10:51:10 UTC 2016

Hi,

Looking for some guidance here as we are completely blocked otherwise :(.

-Regards
Nikhil

On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com> wrote:

> Corrected the subject.
>
> We went ahead and captured corosync debug logs for our ppc board.
> After log analysis and comparison with the sucessful logs( from x86
> machine) ,
> we didnt find * "[ MAIN  ] Completed service synchronization, ready to
> provide service.*" in ppc logs.
> So, looks like corosync is not in a position to accept connection from
> Pacemaker.
> Even I tried with the new corosync.conf with no success.
>
> Any hints on this issue would be really helpful.
>
> Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>
> Regards,
> Sriram
>
>
>
> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com> wrote:
>
>> Hi,
>>
>> I went ahead and made some changes in file system(Like I brought in
>> /etc/init.d/corosync and /etc/init.d/pacemaker, /etc/sysconfig ), After
>> that I was able to run  "pcs cluster start".
>> But it failed with the following error
>>  # pcs cluster start
>> Starting Cluster...
>> Starting Pacemaker Cluster Manager[FAILED]
>> Error: unable to start pacemaker
>>
>> And in the /var/log/pacemaker.log, I saw these errors
>> pacemakerd:     info: mcp_read_config:  cmap connection setup failed:
>> CS_ERR_TRY_AGAIN.  Retrying in 4s
>> Apr 29 08:53:47 [15863] node_cu pacemakerd:     info: mcp_read_config:
>> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
>> Apr 29 08:53:52 [15863] node_cu pacemakerd:  warning: mcp_read_config:
>> Could not connect to Cluster Configuration Database API, error 6
>> Apr 29 08:53:52 [15863] node_cu pacemakerd:   notice: main:     Could not
>> obtain corosync config data, exiting
>> Apr 29 08:53:52 [15863] node_cu pacemakerd:     info: crm_xml_cleanup:
>> Cleaning up memory from libxml2
>>
>>
>> And in the /var/log/Debuglog, I saw these errors coming from corosync
>> 20160429 085347.487050 airv_cu daemon.warn corosync[12857]:   [QB    ]
>> Denied connection, is not ready (12857-15863-14)
>> 20160429 085347.487067 airv_cu daemon.info corosync[12857]:   [QB    ]
>> Denied connection, is not ready (12857-15863-14)
>>
>>
>> I browsed the code of libqb to find that it is failing in
>>
>> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>
>> Line 600 :
>> handle_new_connection function
>>
>> Line 637:
>> if (auth_result == 0 && c->service->serv_fns.connection_accept) {
>>         res = c->service->serv_fns.connection_accept(c,
>>                                  c->euid, c->egid);
>>     }
>>     if (res != 0) {
>>         goto send_response;
>>     }
>>
>> Any hints on this issue would be really helpful for me to go ahead.
>> Please let me know if any logs are required,
>>
>> Regards,
>> Sriram
>>
>> On Thu, Apr 28, 2016 at 2:42 PM, Sriram <sriram.ec at gmail.com> wrote:
>>
>>> Thanks Ken and Emmanuel.
>>> Its a big endian machine. I will try with running "pcs cluster setup"
>>> and "pcs cluster start"
>>> Inside cluster.py, "service pacemaker start" and "service corosync
>>> start" are executed to bring up pacemaker and corosync.
>>> Those service scripts and the infrastructure needed to bring up the
>>> processes in the above said manner doesn't exist in my board.
>>> As it is a embedded board with the limited memory, full fledged linux is
>>> not installed.
>>> Just curious to know, what could be reason the pacemaker throws that
>>> error.
>>>
>>>
>>>
>>> *"cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s"*
>>> Thanks for response.
>>>
>>> Regards,
>>> Sriram.
>>>
>>> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot <kgaillot at redhat.com>
>>> wrote:
>>>
>>>> On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>>> > you need to use pcs to do everything, pcs cluster setup and pcs
>>>> > cluster start, try to use the redhat docs for more information.
>>>>
>>>> Agreed -- pcs cluster setup will create a proper corosync.conf for you.
>>>> Your corosync.conf below uses corosync 1 syntax, and there were
>>>> significant changes in corosync 2. In particular, you don't need the
>>>> file created in step 4, because pacemaker is no longer launched via a
>>>> corosync plugin.
>>>>
>>>> > 2016-04-27 17:28 GMT+02:00 Sriram <sriram.ec at gmail.com>:
>>>> >> Dear All,
>>>> >>
>>>> >> I m trying to use pacemaker and corosync for the clustering
>>>> requirement that
>>>> >> came up recently.
>>>> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc
>>>> >> environment (Target board where pacemaker and corosync are supposed
>>>> to run)
>>>> >> I m having trouble bringing up pacemaker in that environment, though
>>>> I could
>>>> >> successfully bring up corosync.
>>>> >> Any help is welcome.
>>>> >>
>>>> >> I m using these versions of pacemaker and corosync
>>>> >> [root at node_cu pacemaker]# corosync -v
>>>> >> Corosync Cluster Engine, version '2.3.5'
>>>> >> Copyright (c) 2006-2009 Red Hat, Inc.
>>>> >> [root at node_cu pacemaker]# pacemakerd -$
>>>> >> Pacemaker 1.1.14
>>>> >> Written by Andrew Beekhof
>>>> >>
>>>> >> For running corosync, I did the following.
>>>> >> 1. Created the following directories,
>>>> >>     /var/lib/pacemaker
>>>> >>     /var/lib/corosync
>>>> >>     /var/lib/pacemaker
>>>> >>     /var/lib/pacemaker/cores
>>>> >>     /var/lib/pacemaker/pengine
>>>> >>     /var/lib/pacemaker/blackbox
>>>> >>     /var/lib/pacemaker/cib
>>>> >>
>>>> >>
>>>> >> 2. Created a file called corosync.conf under /etc/corosync folder
>>>> with the
>>>> >> following contents
>>>> >>
>>>> >> totem {
>>>> >>
>>>> >>         version: 2
>>>> >>         token:          5000
>>>> >>         token_retransmits_before_loss_const: 20
>>>> >>         join:           1000
>>>> >>         consensus:      7500
>>>> >>         vsftype:        none
>>>> >>         max_messages:   20
>>>> >>         secauth:        off
>>>> >>         cluster_name:   mycluster
>>>> >>         transport:      udpu
>>>> >>         threads:        0
>>>> >>         clear_node_high_bit: yes
>>>> >>
>>>> >>         interface {
>>>> >>                 ringnumber: 0
>>>> >>                 # The following three values need to be set based on
>>>> your
>>>> >> environment
>>>> >>                 bindnetaddr: 10.x.x.x
>>>> >>                 mcastaddr: 226.94.1.1
>>>> >>                 mcastport: 5405
>>>> >>         }
>>>> >>  }
>>>> >>
>>>> >>  logging {
>>>> >>         fileline: off
>>>> >>         to_syslog: yes
>>>> >>         to_stderr: no
>>>> >>         to_syslog: yes
>>>> >>         logfile: /var/log/corosync.log
>>>> >>         syslog_facility: daemon
>>>> >>         debug: on
>>>> >>         timestamp: on
>>>> >>  }
>>>> >>
>>>> >>  amf {
>>>> >>         mode: disabled
>>>> >>  }
>>>> >>
>>>> >>  quorum {
>>>> >>         provider: corosync_votequorum
>>>> >>  }
>>>> >>
>>>> >> nodelist {
>>>> >>   node {
>>>> >>         ring0_addr: node_cu
>>>> >>         nodeid: 1
>>>> >>        }
>>>> >> }
>>>> >>
>>>> >> 3.  Created authkey under /etc/corosync
>>>> >>
>>>> >> 4.  Created a file called pcmk under /etc/corosync/service.d and
>>>> contents as
>>>> >> below,
>>>> >>       cat pcmk
>>>> >>       service {
>>>> >>          # Load the Pacemaker Cluster Resource Manager
>>>> >>          name: pacemaker
>>>> >>          ver:  1
>>>> >>       }
>>>> >>
>>>> >> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip
>>>> >>
>>>> >> 6. ./corosync -f -p & --> this step started corosync
>>>> >>
>>>> >> [root at node_cu pacemaker]# netstat -alpn | grep -i coros
>>>> >> udp        0      0 10.X.X.X:61841     0.0.0.0:*
>>>> >> 9133/corosync
>>>> >> udp        0      0 10.X.X.X:5405      0.0.0.0:*
>>>> >> 9133/corosync
>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148888
>>>> 9133/corosync
>>>> >> @quorum
>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148884
>>>> 9133/corosync
>>>> >> @cmap
>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148887
>>>> 9133/corosync
>>>> >> @votequorum
>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148885
>>>> 9133/corosync
>>>> >> @cfg
>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148886
>>>> 9133/corosync
>>>> >> @cpg
>>>> >> unix  2      [ ]         DGRAM                    148840
>>>> 9133/corosync
>>>> >>
>>>> >> 7. ./pacemakerd -f & gives the following error and exits.
>>>> >> [root at node_cu pacemaker]# pacemakerd -f
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 2s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 3s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
>>>> >> Could not connect to Cluster Configuration Database API, error 6
>>>> >>
>>>> >> Can you please point me, what is missing in these steps ?
>>>> >>
>>>> >> Before trying these steps, I tried running "pcs cluster start", but
>>>> that
>>>> >> command fails with "service" script not found. As the root filesystem
>>>> >> doesn't contain either /etc/init.d/ or /sbin/service
>>>> >>
>>>> >> So, the plan is to bring up corosync and pacemaker manually, later
>>>> do the
>>>> >> cluster configuration using "pcs" commands.
>>>> >>
>>>> >> Regards,
>>>> >> Sriram
>>>> >>
>>>> >> _______________________________________________
>>>> >> Users mailing list: Users at clusterlabs.org
>>>> >> http://clusterlabs.org/mailman/listinfo/users
>>>> >>
>>>> >> Project Home: http://www.clusterlabs.org
>>>> >> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> >> Bugs: http://bugs.clusterlabs.org
>>>> >>
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160501/dd1ae05c/attachment-0003.html>