[ClusterLabs] [ClusterLab] : Corosync not initializing successfully

Mon May 2 04:44:00 UTC 2016

Re-sending as I don't see my post on the thread.

On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane <nikhil.subscribed at gmail.com>
wrote:

> Hi,
>
> Looking for some guidance here as we are completely blocked otherwise :(.
>
> -Regards
> Nikhil
>
> On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com> wrote:
>
>> Corrected the subject.
>>
>> We went ahead and captured corosync debug logs for our ppc board.
>> After log analysis and comparison with the sucessful logs( from x86
>> machine) ,
>> we didnt find * "[ MAIN  ] Completed service synchronization, ready to
>> provide service.*" in ppc logs.
>> So, looks like corosync is not in a position to accept connection from
>> Pacemaker.
>> Even I tried with the new corosync.conf with no success.
>>
>> Any hints on this issue would be really helpful.
>>
>> Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>>
>> Regards,
>> Sriram
>>
>>
>>
>> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I went ahead and made some changes in file system(Like I brought in
>>> /etc/init.d/corosync and /etc/init.d/pacemaker, /etc/sysconfig ), After
>>> that I was able to run  "pcs cluster start".
>>> But it failed with the following error
>>>  # pcs cluster start
>>> Starting Cluster...
>>> Starting Pacemaker Cluster Manager[FAILED]
>>> Error: unable to start pacemaker
>>>
>>> And in the /var/log/pacemaker.log, I saw these errors
>>> pacemakerd:     info: mcp_read_config:  cmap connection setup failed:
>>> CS_ERR_TRY_AGAIN.  Retrying in 4s
>>> Apr 29 08:53:47 [15863] node_cu pacemakerd:     info: mcp_read_config:
>>> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
>>> Apr 29 08:53:52 [15863] node_cu pacemakerd:  warning: mcp_read_config:
>>> Could not connect to Cluster Configuration Database API, error 6
>>> Apr 29 08:53:52 [15863] node_cu pacemakerd:   notice: main:     Could
>>> not obtain corosync config data, exiting
>>> Apr 29 08:53:52 [15863] node_cu pacemakerd:     info: crm_xml_cleanup:
>>> Cleaning up memory from libxml2
>>>
>>>
>>> And in the /var/log/Debuglog, I saw these errors coming from corosync
>>> 20160429 085347.487050 airv_cu daemon.warn corosync[12857]:   [QB    ]
>>> Denied connection, is not ready (12857-15863-14)
>>> 20160429 085347.487067 airv_cu daemon.info corosync[12857]:   [QB    ]
>>> Denied connection, is not ready (12857-15863-14)
>>>
>>>
>>> I browsed the code of libqb to find that it is failing in
>>>
>>> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>>
>>> Line 600 :
>>> handle_new_connection function
>>>
>>> Line 637:
>>> if (auth_result == 0 && c->service->serv_fns.connection_accept) {
>>>         res = c->service->serv_fns.connection_accept(c,
>>>                                  c->euid, c->egid);
>>>     }
>>>     if (res != 0) {
>>>         goto send_response;
>>>     }
>>>
>>> Any hints on this issue would be really helpful for me to go ahead.
>>> Please let me know if any logs are required,
>>>
>>> Regards,
>>> Sriram
>>>
>>> On Thu, Apr 28, 2016 at 2:42 PM, Sriram <sriram.ec at gmail.com> wrote:
>>>
>>>> Thanks Ken and Emmanuel.
>>>> Its a big endian machine. I will try with running "pcs cluster setup"
>>>> and "pcs cluster start"
>>>> Inside cluster.py, "service pacemaker start" and "service corosync
>>>> start" are executed to bring up pacemaker and corosync.
>>>> Those service scripts and the infrastructure needed to bring up the
>>>> processes in the above said manner doesn't exist in my board.
>>>> As it is a embedded board with the limited memory, full fledged linux
>>>> is not installed.
>>>> Just curious to know, what could be reason the pacemaker throws that
>>>> error.
>>>>
>>>>
>>>>
>>>> *"cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s"*
>>>> Thanks for response.
>>>>
>>>> Regards,
>>>> Sriram.
>>>>
>>>> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot <kgaillot at redhat.com>
>>>> wrote:
>>>>
>>>>> On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>>>> > you need to use pcs to do everything, pcs cluster setup and pcs
>>>>> > cluster start, try to use the redhat docs for more information.
>>>>>
>>>>> Agreed -- pcs cluster setup will create a proper corosync.conf for you.
>>>>> Your corosync.conf below uses corosync 1 syntax, and there were
>>>>> significant changes in corosync 2. In particular, you don't need the
>>>>> file created in step 4, because pacemaker is no longer launched via a
>>>>> corosync plugin.
>>>>>
>>>>> > 2016-04-27 17:28 GMT+02:00 Sriram <sriram.ec at gmail.com>:
>>>>> >> Dear All,
>>>>> >>
>>>>> >> I m trying to use pacemaker and corosync for the clustering
>>>>> requirement that
>>>>> >> came up recently.
>>>>> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc
>>>>> >> environment (Target board where pacemaker and corosync are supposed
>>>>> to run)
>>>>> >> I m having trouble bringing up pacemaker in that environment,
>>>>> though I could
>>>>> >> successfully bring up corosync.
>>>>> >> Any help is welcome.
>>>>> >>
>>>>> >> I m using these versions of pacemaker and corosync
>>>>> >> [root at node_cu pacemaker]# corosync -v
>>>>> >> Corosync Cluster Engine, version '2.3.5'
>>>>> >> Copyright (c) 2006-2009 Red Hat, Inc.
>>>>> >> [root at node_cu pacemaker]# pacemakerd -$
>>>>> >> Pacemaker 1.1.14
>>>>> >> Written by Andrew Beekhof
>>>>> >>
>>>>> >> For running corosync, I did the following.
>>>>> >> 1. Created the following directories,
>>>>> >>     /var/lib/pacemaker
>>>>> >>     /var/lib/corosync
>>>>> >>     /var/lib/pacemaker
>>>>> >>     /var/lib/pacemaker/cores
>>>>> >>     /var/lib/pacemaker/pengine
>>>>> >>     /var/lib/pacemaker/blackbox
>>>>> >>     /var/lib/pacemaker/cib
>>>>> >>
>>>>> >>
>>>>> >> 2. Created a file called corosync.conf under /etc/corosync folder
>>>>> with the
>>>>> >> following contents
>>>>> >>
>>>>> >> totem {
>>>>> >>
>>>>> >>         version: 2
>>>>> >>         token:          5000
>>>>> >>         token_retransmits_before_loss_const: 20
>>>>> >>         join:           1000
>>>>> >>         consensus:      7500
>>>>> >>         vsftype:        none
>>>>> >>         max_messages:   20
>>>>> >>         secauth:        off
>>>>> >>         cluster_name:   mycluster
>>>>> >>         transport:      udpu
>>>>> >>         threads:        0
>>>>> >>         clear_node_high_bit: yes
>>>>> >>
>>>>> >>         interface {
>>>>> >>                 ringnumber: 0
>>>>> >>                 # The following three values need to be set based
>>>>> on your
>>>>> >> environment
>>>>> >>                 bindnetaddr: 10.x.x.x
>>>>> >>                 mcastaddr: 226.94.1.1
>>>>> >>                 mcastport: 5405
>>>>> >>         }
>>>>> >>  }
>>>>> >>
>>>>> >>  logging {
>>>>> >>         fileline: off
>>>>> >>         to_syslog: yes
>>>>> >>         to_stderr: no
>>>>> >>         to_syslog: yes
>>>>> >>         logfile: /var/log/corosync.log
>>>>> >>         syslog_facility: daemon
>>>>> >>         debug: on
>>>>> >>         timestamp: on
>>>>> >>  }
>>>>> >>
>>>>> >>  amf {
>>>>> >>         mode: disabled
>>>>> >>  }
>>>>> >>
>>>>> >>  quorum {
>>>>> >>         provider: corosync_votequorum
>>>>> >>  }
>>>>> >>
>>>>> >> nodelist {
>>>>> >>   node {
>>>>> >>         ring0_addr: node_cu
>>>>> >>         nodeid: 1
>>>>> >>        }
>>>>> >> }
>>>>> >>
>>>>> >> 3.  Created authkey under /etc/corosync
>>>>> >>
>>>>> >> 4.  Created a file called pcmk under /etc/corosync/service.d and
>>>>> contents as
>>>>> >> below,
>>>>> >>       cat pcmk
>>>>> >>       service {
>>>>> >>          # Load the Pacemaker Cluster Resource Manager
>>>>> >>          name: pacemaker
>>>>> >>          ver:  1
>>>>> >>       }
>>>>> >>
>>>>> >> 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip
>>>>> >>
>>>>> >> 6. ./corosync -f -p & --> this step started corosync
>>>>> >>
>>>>> >> [root at node_cu pacemaker]# netstat -alpn | grep -i coros
>>>>> >> udp        0      0 10.X.X.X:61841     0.0.0.0:*
>>>>> >> 9133/corosync
>>>>> >> udp        0      0 10.X.X.X:5405      0.0.0.0:*
>>>>> >> 9133/corosync
>>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148888
>>>>> 9133/corosync
>>>>> >> @quorum
>>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148884
>>>>> 9133/corosync
>>>>> >> @cmap
>>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148887
>>>>> 9133/corosync
>>>>> >> @votequorum
>>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148885
>>>>> 9133/corosync
>>>>> >> @cfg
>>>>> >> unix  2      [ ACC ]     STREAM     LISTENING     148886
>>>>> 9133/corosync
>>>>> >> @cpg
>>>>> >> unix  2      [ ]         DGRAM                    148840
>>>>> 9133/corosync
>>>>> >>
>>>>> >> 7. ./pacemakerd -f & gives the following error and exits.
>>>>> >> [root at node_cu pacemaker]# pacemakerd -f
>>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 1s
>>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 2s
>>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 3s
>>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 4s
>>>>> >> cmap connection setup failed: CS_ERR_TRY_AGAIN.  Retrying in 5s
>>>>> >> Could not connect to Cluster Configuration Database API, error 6
>>>>> >>
>>>>> >> Can you please point me, what is missing in these steps ?
>>>>> >>
>>>>> >> Before trying these steps, I tried running "pcs cluster start", but
>>>>> that
>>>>> >> command fails with "service" script not found. As the root
>>>>> filesystem
>>>>> >> doesn't contain either /etc/init.d/ or /sbin/service
>>>>> >>
>>>>> >> So, the plan is to bring up corosync and pacemaker manually, later
>>>>> do the
>>>>> >> cluster configuration using "pcs" commands.
>>>>> >>
>>>>> >> Regards,
>>>>> >> Sriram
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Users mailing list: Users at clusterlabs.org
>>>>> >> http://clusterlabs.org/mailman/listinfo/users
>>>>> >>
>>>>> >> Project Home: http://www.clusterlabs.org
>>>>> >> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> >> Bugs: http://bugs.clusterlabs.org
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160502/9ab52cae/attachment.htm>