[ClusterLabs] [ClusterLab] : Corosync not initializing successfully
Nikhil Utane
nikhil.subscribed at gmail.com
Sun May 1 10:51:10 UTC 2016
Hi,
Looking for some guidance here as we are completely blocked otherwise :(.
-Regards
Nikhil
On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com> wrote:
> Corrected the subject.
>
> We went ahead and captured corosync debug logs for our ppc board.
> After log analysis and comparison with the sucessful logs( from x86
> machine) ,
> we didnt find * "[ MAIN ] Completed service synchronization, ready to
> provide service.*" in ppc logs.
> So, looks like corosync is not in a position to accept connection from
> Pacemaker.
> Even I tried with the new corosync.conf with no success.
>
> Any hints on this issue would be really helpful.
>
> Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>
> Regards,
> Sriram
>
>
>
> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com> wrote:
>
>> Hi,
>>
>> I went ahead and made some changes in file system(Like I brought in
>> /etc/init.d/corosync and /etc/init.d/pacemaker, /etc/sysconfig ), After
>> that I was able to run "pcs cluster start".
>> But it failed with the following error
>> # pcs cluster start
>> Starting Cluster...
>> Starting Pacemaker Cluster Manager[FAILED]
>> Error: unable to start pacemaker
>>
>> And in the /var/log/pacemaker.log, I saw these errors
>> pacemakerd: info: mcp_read_config: cmap connection setup failed:
>> CS_ERR_TRY_AGAIN. Retrying in 4s
>> Apr 29 08:53:47 [15863] node_cu pacemakerd: info: mcp_read_config:
>> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 5s
>> Apr 29 08:53:52 [15863] node_cu pacemakerd: warning: mcp_read_config:
>> Could not connect to Cluster Configuration Database API, error 6
>> Apr 29 08:53:52 [15863] node_cu pacemakerd: notice: main: Could not
>> obtain corosync config data, exiting
>> Apr 29 08:53:52 [15863] node_cu pacemakerd: info: crm_xml_cleanup:
>> Cleaning up memory from libxml2
>>
>>
>> And in the /var/log/Debuglog, I saw these errors coming from corosync
>> 20160429 085347.487050 airv_cu daemon.warn corosync[12857]: [QB ]
>> Denied connection, is not ready (12857-15863-14)
>> 20160429 085347.487067 airv_cu daemon.info corosync[12857]: [QB ]
>> Denied connection, is not ready (12857-15863-14)
>>
>>
>> I browsed the code of libqb to find that it is failing in
>>
>> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>
>> Line 600 :
>> handle_new_connection function
>>
>> Line 637:
>> if (auth_result == 0 && c->service->serv_fns.connection_accept) {
>> res = c->service->serv_fns.connection_accept(c,
>> c->euid, c->egid);
>> }
>> if (res != 0) {
>> goto send_response;
>> }
>>
>> Any hints on this issue would be really helpful for me to go ahead.
>> Please let me know if any logs are required,
>>
>> Regards,
>> Sriram
>>
>> On Thu, Apr 28, 2016 at 2:42 PM, Sriram <sriram.ec at gmail.com> wrote:
>>
>>> Thanks Ken and Emmanuel.
>>> Its a big endian machine. I will try with running "pcs cluster setup"
>>> and "pcs cluster start"
>>> Inside cluster.py, "service pacemaker start" and "service corosync
>>> start" are executed to bring up pacemaker and corosync.
>>> Those service scripts and the infrastructure needed to bring up the
>>> processes in the above said manner doesn't exist in my board.
>>> As it is a embedded board with the limited memory, full fledged linux is
>>> not installed.
>>> Just curious to know, what could be reason the pacemaker throws that
>>> error.
>>>
>>>
>>>
>>> *"cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s"*
>>> Thanks for response.
>>>
>>> Regards,
>>> Sriram.
>>>
>>> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot <kgaillot at redhat.com>
>>> wrote:
>>>
>>>> On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>>> > you need to use pcs to do everything, pcs cluster setup and pcs
>>>> > cluster start, try to use the redhat docs for more information.
>>>>
>>>> Agreed -- pcs cluster setup will create a proper corosync.conf for you.
>>>> Your corosync.conf below uses corosync 1 syntax, and there were
>>>> significant changes in corosync 2. In particular, you don't need the
>>>> file created in step 4, because pacemaker is no longer launched via a
>>>> corosync plugin.
>>>>
>>>> > 2016-04-27 17:28 GMT+02:00 Sriram <sriram.ec at gmail.com>:
>>>> >> Dear All,
>>>> >>
>>>> >> I m trying to use pacemaker and corosync for the clustering
>>>> requirement that
>>>> >> came up recently.
>>>> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc
>>>> >> environment (Target board where pacemaker and corosync are supposed
>>>> to run)
>>>> >> I m having trouble bringing up pacemaker in that environment, though
>>>> I could
>>>> >> successfully bring up corosync.
>>>> >> Any help is welcome.
>>>> >>
>>>> >> I m using these versions of pacemaker and corosync
>>>> >> [root at node_cu pacemaker]# corosync -v
>>>> >> Corosync Cluster Engine, version '2.3.5'
>>>> >> Copyright (c) 2006-2009 Red Hat, Inc.
>>>> >> [root at node_cu pacemaker]# pacemakerd -$
>>>> >> Pacemaker 1.1.14
>>>> >> Written by Andrew Beekhof
>>>> >>
>>>> >> For running corosync, I did the following.
>>>> >> 1. Created the following directories,
>>>> >> /var/lib/pacemaker
>>>> >> /var/lib/corosync
>>>> >> /var/lib/pacemaker
>>>> >> /var/lib/pacemaker/cores
>>>> >> /var/lib/pacemaker/pengine
>>>> >> /var/lib/pacemaker/blackbox
>>>> >> /var/lib/pacemaker/cib
>>>> >>
>>>> >>
>>>> >> 2. Created a file called corosync.conf under /etc/corosync folder
>>>> with the
>>>> >> following contents
>>>> >>
>>>> >> totem {
>>>> >>
>>>> >> version: 2
>>>> >> token: 5000
>>>> >> token_retransmits_before_loss_const: 20
>>>> >> join: 1000
>>>> >> consensus: 7500
>>>> >> vsftype: none
>>>> >> max_messages: 20
>>>> >> secauth: off
>>>> >> cluster_name: mycluster
>>>> >> transport: udpu
>>>> >> threads: 0
>>>> >> clear_node_high_bit: yes
>>>> >>
>>>> >> interface {
>>>> >> ringnumber: 0
>>>> >> # The following three values need to be set based on
>>>> your
>>>> >> environment
>>>> >> bindnetaddr: 10.x.x.x
>>>> >> mcastaddr: 226.94.1.1
>>>> >> mcastport: 5405
>>>> >> }
>>>> >> }
>>>> >>
>>>> >> logging {
>>>> >> fileline: off
>>>> >> to_syslog: yes
>>>> >> to_stderr: no
>>>> >> to_syslog: yes
>>>> >> logfile: /var/log/corosync.log
>>>> >> syslog_facility: daemon
>>>> >> debug: on
>>>> >> timestamp: on
>>>> >> }
>>>> >>
>>>> >> amf {
>>>> >> mode: disabled
>>>> >> }
>>>> >>
>>>> >> quorum {
>>>> >> provider: corosync_votequorum
>>>> >> }
>>>> >>
>>>> >> nodelist {
>>>> >> node {
>>>> >> ring0_addr: node_cu
>>>> >> nodeid: 1
>>>> >> }
>>>> >> }
>>>> >>
>>>> >> 3. Created authkey under /etc/corosync
>>>> >>
>>>> >> 4. Created a file called pcmk under /etc/corosync/service.d and
>>>> contents as
>>>> >> below,
>>>> >> cat pcmk
>>>> >> service {
>>>> >> # Load the Pacemaker Cluster Resource Manager
>>>> >> name: pacemaker
>>>> >> ver: 1
>>>> >> }
>>>> >>
>>>> >> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip
>>>> >>
>>>> >> 6. ./corosync -f -p & --> this step started corosync
>>>> >>
>>>> >> [root at node_cu pacemaker]# netstat -alpn | grep -i coros
>>>> >> udp 0 0 10.X.X.X:61841 0.0.0.0:*
>>>> >> 9133/corosync
>>>> >> udp 0 0 10.X.X.X:5405 0.0.0.0:*
>>>> >> 9133/corosync
>>>> >> unix 2 [ ACC ] STREAM LISTENING 148888
>>>> 9133/corosync
>>>> >> @quorum
>>>> >> unix 2 [ ACC ] STREAM LISTENING 148884
>>>> 9133/corosync
>>>> >> @cmap
>>>> >> unix 2 [ ACC ] STREAM LISTENING 148887
>>>> 9133/corosync
>>>> >> @votequorum
>>>> >> unix 2 [ ACC ] STREAM LISTENING 148885
>>>> 9133/corosync
>>>> >> @cfg
>>>> >> unix 2 [ ACC ] STREAM LISTENING 148886
>>>> 9133/corosync
>>>> >> @cpg
>>>> >> unix 2 [ ] DGRAM 148840
>>>> 9133/corosync
>>>> >>
>>>> >> 7. ./pacemakerd -f & gives the following error and exits.
>>>> >> [root at node_cu pacemaker]# pacemakerd -f
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 2s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 3s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s
>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 5s
>>>> >> Could not connect to Cluster Configuration Database API, error 6
>>>> >>
>>>> >> Can you please point me, what is missing in these steps ?
>>>> >>
>>>> >> Before trying these steps, I tried running "pcs cluster start", but
>>>> that
>>>> >> command fails with "service" script not found. As the root filesystem
>>>> >> doesn't contain either /etc/init.d/ or /sbin/service
>>>> >>
>>>> >> So, the plan is to bring up corosync and pacemaker manually, later
>>>> do the
>>>> >> cluster configuration using "pcs" commands.
>>>> >>
>>>> >> Regards,
>>>> >> Sriram
>>>> >>
>>>> >> _______________________________________________
>>>> >> Users mailing list: Users at clusterlabs.org
>>>> >> http://clusterlabs.org/mailman/listinfo/users
>>>> >>
>>>> >> Project Home: http://www.clusterlabs.org
>>>> >> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> >> Bugs: http://bugs.clusterlabs.org
>>>> >>
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160501/dd1ae05c/attachment-0003.html>
More information about the Users
mailing list