[ClusterLabs] [ClusterLab] : Corosync not initializing successfully
Nikhil Utane
nikhil.subscribed at gmail.com
Mon May 2 07:30:25 UTC 2016
So what I understand what you are saying is, if the HW is bi-endian, then
enable LE on PPC. Is that right?
Need to check on that.
On Mon, May 2, 2016 at 12:49 PM, Nikhil Utane <nikhil.subscribed at gmail.com>
wrote:
> Sorry about my ignorance but could you pls elaborate what do you mean by
> "try to ppcle"?
>
> Our target platform is ppc so it is BE. We have to get it running only on
> that.
> How do we know this is LE/BE issue and nothing else?
>
> -Thanks
> Nikhil
>
>
> On Mon, May 2, 2016 at 12:24 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>
>> As your hardware is probably capable of running ppcle and if you have an
>>> environment
>>> at hand without too much effort it might pay off to try that.
>>> There are of course distributions out there support corosync on
>>> big-endian architectures
>>> but I don't know if there is an automatized regression for corosync on
>>> big-endian that
>>> would catch big-endian-issues right away with something as current as
>>> your 2.3.5.
>>>
>>
>> No we are not testing big-endian.
>>
>> So totally agree with Klaus. Give a try to ppcle. Also make sure all
>> nodes are little-endian. Corosync should work in mixed BE/LE environment
>> but because it's not tested, it may not work (and it's a bug, so if ppcle
>> works I will try to fix BE).
>>
>> Regards,
>> Honza
>>
>>
>>
>>> Regards,
>>> Klaus
>>>
>>> On 05/02/2016 06:44 AM, Nikhil Utane wrote:
>>>
>>>> Re-sending as I don't see my post on the thread.
>>>>
>>>> On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane
>>>> <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Looking for some guidance here as we are completely blocked
>>>> otherwise :(.
>>>>
>>>> -Regards
>>>> Nikhil
>>>>
>>>> On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com
>>>> <mailto:sriram.ec at gmail.com>> wrote:
>>>>
>>>> Corrected the subject.
>>>>
>>>> We went ahead and captured corosync debug logs for our ppc
>>>> board.
>>>> After log analysis and comparison with the sucessful logs(
>>>> from x86 machine) ,
>>>> we didnt find *"[ MAIN ] Completed service synchronization,
>>>> ready to provide service.*" in ppc logs.
>>>> So, looks like corosync is not in a position to accept
>>>> connection from Pacemaker.
>>>> Even I tried with the new corosync.conf with no success.
>>>>
>>>> Any hints on this issue would be really helpful.
>>>>
>>>> Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>>>>
>>>> Regards,
>>>> Sriram
>>>>
>>>>
>>>>
>>>> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com
>>>> <mailto:sriram.ec at gmail.com>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I went ahead and made some changes in file system(Like I
>>>> brought in /etc/init.d/corosync and /etc/init.d/pacemaker,
>>>> /etc/sysconfig ), After that I was able to run "pcs
>>>> cluster start".
>>>> But it failed with the following error
>>>> # pcs cluster start
>>>> Starting Cluster...
>>>> Starting Pacemaker Cluster Manager[FAILED]
>>>> Error: unable to start pacemaker
>>>>
>>>> And in the /var/log/pacemaker.log, I saw these errors
>>>> pacemakerd: info: mcp_read_config: cmap connection
>>>> setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s
>>>> Apr 29 08:53:47 [15863] node_cu pacemakerd: info:
>>>> mcp_read_config: cmap connection setup failed:
>>>> CS_ERR_TRY_AGAIN. Retrying in 5s
>>>> Apr 29 08:53:52 [15863] node_cu pacemakerd: warning:
>>>> mcp_read_config: Could not connect to Cluster
>>>> Configuration Database API, error 6
>>>> Apr 29 08:53:52 [15863] node_cu pacemakerd: notice:
>>>> main: Could not obtain corosync config data, exiting
>>>> Apr 29 08:53:52 [15863] node_cu pacemakerd: info:
>>>> crm_xml_cleanup: Cleaning up memory from libxml2
>>>>
>>>>
>>>> And in the /var/log/Debuglog, I saw these errors coming
>>>> from corosync
>>>> 20160429 085347.487050 <tel:085347.487050> airv_cu
>>>> daemon.warn corosync[12857]: [QB ] Denied connection,
>>>> is not ready (12857-15863-14)
>>>> 20160429 085347.487067 <tel:085347.487067> airv_cu
>>>> daemon.info <http://daemon.info> corosync[12857]: [QB
>>>> ] Denied connection, is not ready (12857-15863-14)
>>>>
>>>>
>>>> I browsed the code of libqb to find that it is failing in
>>>>
>>>>
>>>> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>>>
>>>> Line 600 :
>>>> handle_new_connection function
>>>>
>>>> Line 637:
>>>> if (auth_result == 0 &&
>>>> c->service->serv_fns.connection_accept) {
>>>> res = c->service->serv_fns.connection_accept(c,
>>>> c->euid, c->egid);
>>>> }
>>>> if (res != 0) {
>>>> goto send_response;
>>>> }
>>>>
>>>> Any hints on this issue would be really helpful for me to
>>>> go ahead.
>>>> Please let me know if any logs are required,
>>>>
>>>> Regards,
>>>> Sriram
>>>>
>>>> On Thu, Apr 28, 2016 at 2:42 PM, Sriram
>>>> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>> wrote:
>>>>
>>>> Thanks Ken and Emmanuel.
>>>> Its a big endian machine. I will try with running "pcs
>>>> cluster setup" and "pcs cluster start"
>>>> Inside cluster.py, "service pacemaker start" and
>>>> "service corosync start" are executed to bring up
>>>> pacemaker and corosync.
>>>> Those service scripts and the infrastructure needed to
>>>> bring up the processes in the above said manner
>>>> doesn't exist in my board.
>>>> As it is a embedded board with the limited memory,
>>>> full fledged linux is not installed.
>>>> Just curious to know, what could be reason the
>>>> pacemaker throws that error.
>>>>
>>>> /"cmap connection setup failed: CS_ERR_TRY_AGAIN.
>>>> Retrying in 1s"
>>>>
>>>> /
>>>> Thanks for response.
>>>>
>>>> Regards,
>>>> Sriram.
>>>>
>>>> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot
>>>> <kgaillot at redhat.com <mailto:kgaillot at redhat.com>>
>>>> wrote:
>>>>
>>>> On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>>> > you need to use pcs to do everything, pcs
>>>> cluster setup and pcs
>>>> > cluster start, try to use the redhat docs for
>>>> more information.
>>>>
>>>> Agreed -- pcs cluster setup will create a proper
>>>> corosync.conf for you.
>>>> Your corosync.conf below uses corosync 1 syntax,
>>>> and there were
>>>> significant changes in corosync 2. In particular,
>>>> you don't need the
>>>> file created in step 4, because pacemaker is no
>>>> longer launched via a
>>>> corosync plugin.
>>>>
>>>> > 2016-04-27 17:28 GMT+02:00 Sriram
>>>> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com
>>>> >>:
>>>> >> Dear All,
>>>> >>
>>>> >> I m trying to use pacemaker and corosync for
>>>> the clustering requirement that
>>>> >> came up recently.
>>>> >> We have cross compiled corosync, pacemaker and
>>>> pcs(python) for ppc
>>>> >> environment (Target board where pacemaker and
>>>> corosync are supposed to run)
>>>> >> I m having trouble bringing up pacemaker in
>>>> that environment, though I could
>>>> >> successfully bring up corosync.
>>>> >> Any help is welcome.
>>>> >>
>>>> >> I m using these versions of pacemaker and
>>>> corosync
>>>> >> [root at node_cu pacemaker]# corosync -v
>>>> >> Corosync Cluster Engine, version '2.3.5'
>>>> >> Copyright (c) 2006-2009 Red Hat, Inc.
>>>> >> [root at node_cu pacemaker]# pacemakerd -$
>>>> >> Pacemaker 1.1.14
>>>> >> Written by Andrew Beekhof
>>>> >>
>>>> >> For running corosync, I did the following.
>>>> >> 1. Created the following directories,
>>>> >> /var/lib/pacemaker
>>>> >> /var/lib/corosync
>>>> >> /var/lib/pacemaker
>>>> >> /var/lib/pacemaker/cores
>>>> >> /var/lib/pacemaker/pengine
>>>> >> /var/lib/pacemaker/blackbox
>>>> >> /var/lib/pacemaker/cib
>>>> >>
>>>> >>
>>>> >> 2. Created a file called corosync.conf under
>>>> /etc/corosync folder with the
>>>> >> following contents
>>>> >>
>>>> >> totem {
>>>> >>
>>>> >> version: 2
>>>> >> token: 5000
>>>> >> token_retransmits_before_loss_const: 20
>>>> >> join: 1000
>>>> >> consensus: 7500
>>>> >> vsftype: none
>>>> >> max_messages: 20
>>>> >> secauth: off
>>>> >> cluster_name: mycluster
>>>> >> transport: udpu
>>>> >> threads: 0
>>>> >> clear_node_high_bit: yes
>>>> >>
>>>> >> interface {
>>>> >> ringnumber: 0
>>>> >> # The following three values
>>>> need to be set based on your
>>>> >> environment
>>>> >> bindnetaddr: 10.x.x.x
>>>> >> mcastaddr: 226.94.1.1
>>>> >> mcastport: 5405
>>>> >> }
>>>> >> }
>>>> >>
>>>> >> logging {
>>>> >> fileline: off
>>>> >> to_syslog: yes
>>>> >> to_stderr: no
>>>> >> to_syslog: yes
>>>> >> logfile: /var/log/corosync.log
>>>> >> syslog_facility: daemon
>>>> >> debug: on
>>>> >> timestamp: on
>>>> >> }
>>>> >>
>>>> >> amf {
>>>> >> mode: disabled
>>>> >> }
>>>> >>
>>>> >> quorum {
>>>> >> provider: corosync_votequorum
>>>> >> }
>>>> >>
>>>> >> nodelist {
>>>> >> node {
>>>> >> ring0_addr: node_cu
>>>> >> nodeid: 1
>>>> >> }
>>>> >> }
>>>> >>
>>>> >> 3. Created authkey under /etc/corosync
>>>> >>
>>>> >> 4. Created a file called pcmk under
>>>> /etc/corosync/service.d and contents as
>>>> >> below,
>>>> >> cat pcmk
>>>> >> service {
>>>> >> # Load the Pacemaker Cluster Resource
>>>> Manager
>>>> >> name: pacemaker
>>>> >> ver: 1
>>>> >> }
>>>> >>
>>>> >> 5. Added the node name "node_cu" in /etc/hosts
>>>> with 10.X.X.X ip
>>>> >>
>>>> >> 6. ./corosync -f -p & --> this step started
>>>> corosync
>>>> >>
>>>> >> [root at node_cu pacemaker]# netstat -alpn | grep
>>>> -i coros
>>>> >> udp 0 0 10.X.X.X:61841 0.0.0.0:
>>>> *
>>>> >> 9133/corosync
>>>> >> udp 0 0 10.X.X.X:5405 0.0.0.0:
>>>> *
>>>> >> 9133/corosync
>>>> >> unix 2 [ ACC ] STREAM LISTENING
>>>> 148888 9133/corosync
>>>> >> @quorum
>>>> >> unix 2 [ ACC ] STREAM LISTENING
>>>> 148884 9133/corosync
>>>> >> @cmap
>>>> >> unix 2 [ ACC ] STREAM LISTENING
>>>> 148887 9133/corosync
>>>> >> @votequorum
>>>> >> unix 2 [ ACC ] STREAM LISTENING
>>>> 148885 9133/corosync
>>>> >> @cfg
>>>> >> unix 2 [ ACC ] STREAM LISTENING
>>>> 148886 9133/corosync
>>>> >> @cpg
>>>> >> unix 2 [ ] DGRAM
>>>> 148840 9133/corosync
>>>> >>
>>>> >> 7. ./pacemakerd -f & gives the following error
>>>> and exits.
>>>> >> [root at node_cu pacemaker]# pacemakerd -f
>>>> >> cmap connection setup failed:
>>>> CS_ERR_TRY_AGAIN. Retrying in 1s
>>>> >> cmap connection setup failed:
>>>> CS_ERR_TRY_AGAIN. Retrying in 2s
>>>> >> cmap connection setup failed:
>>>> CS_ERR_TRY_AGAIN. Retrying in 3s
>>>> >> cmap connection setup failed:
>>>> CS_ERR_TRY_AGAIN. Retrying in 4s
>>>> >> cmap connection setup failed:
>>>> CS_ERR_TRY_AGAIN. Retrying in 5s
>>>> >> Could not connect to Cluster Configuration
>>>> Database API, error 6
>>>> >>
>>>> >> Can you please point me, what is missing in
>>>> these steps ?
>>>> >>
>>>> >> Before trying these steps, I tried running "pcs
>>>> cluster start", but that
>>>> >> command fails with "service" script not found.
>>>> As the root filesystem
>>>> >> doesn't contain either /etc/init.d/ or
>>>> /sbin/service
>>>> >>
>>>> >> So, the plan is to bring up corosync and
>>>> pacemaker manually, later do the
>>>> >> cluster configuration using "pcs" commands.
>>>> >>
>>>> >> Regards,
>>>> >> Sriram
>>>> >>
>>>> >> _______________________________________________
>>>> >> Users mailing list: Users at clusterlabs.org
>>>> <mailto:Users at clusterlabs.org>
>>>> >> http://clusterlabs.org/mailman/listinfo/users
>>>> >>
>>>> >> Project Home: http://www.clusterlabs.org
>>>> >> Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> >> Bugs: http://bugs.clusterlabs.org
>>>> >>
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> <mailto:Users at clusterlabs.org>
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> <mailto:Users at clusterlabs.org>
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160502/a39c6c30/attachment.htm>
More information about the Users
mailing list