[ClusterLabs] [ClusterLab] : Corosync not initializing successfully
Nikhil Utane
nikhil.subscribed at gmail.com
Mon May 2 07:19:27 UTC 2016
Sorry about my ignorance but could you pls elaborate what do you mean by
"try to ppcle"?
Our target platform is ppc so it is BE. We have to get it running only on
that.
How do we know this is LE/BE issue and nothing else?
-Thanks
Nikhil
On Mon, May 2, 2016 at 12:24 PM, Jan Friesse <jfriesse at redhat.com> wrote:
> As your hardware is probably capable of running ppcle and if you have an
>> environment
>> at hand without too much effort it might pay off to try that.
>> There are of course distributions out there support corosync on
>> big-endian architectures
>> but I don't know if there is an automatized regression for corosync on
>> big-endian that
>> would catch big-endian-issues right away with something as current as
>> your 2.3.5.
>>
>
> No we are not testing big-endian.
>
> So totally agree with Klaus. Give a try to ppcle. Also make sure all nodes
> are little-endian. Corosync should work in mixed BE/LE environment but
> because it's not tested, it may not work (and it's a bug, so if ppcle works
> I will try to fix BE).
>
> Regards,
> Honza
>
>
>
>> Regards,
>> Klaus
>>
>> On 05/02/2016 06:44 AM, Nikhil Utane wrote:
>>
>>> Re-sending as I don't see my post on the thread.
>>>
>>> On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane
>>> <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>>
>>> wrote:
>>>
>>> Hi,
>>>
>>> Looking for some guidance here as we are completely blocked
>>> otherwise :(.
>>>
>>> -Regards
>>> Nikhil
>>>
>>> On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com
>>> <mailto:sriram.ec at gmail.com>> wrote:
>>>
>>> Corrected the subject.
>>>
>>> We went ahead and captured corosync debug logs for our ppc
>>> board.
>>> After log analysis and comparison with the sucessful logs(
>>> from x86 machine) ,
>>> we didnt find *"[ MAIN ] Completed service synchronization,
>>> ready to provide service.*" in ppc logs.
>>> So, looks like corosync is not in a position to accept
>>> connection from Pacemaker.
>>> Even I tried with the new corosync.conf with no success.
>>>
>>> Any hints on this issue would be really helpful.
>>>
>>> Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>>>
>>> Regards,
>>> Sriram
>>>
>>>
>>>
>>> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com
>>> <mailto:sriram.ec at gmail.com>> wrote:
>>>
>>> Hi,
>>>
>>> I went ahead and made some changes in file system(Like I
>>> brought in /etc/init.d/corosync and /etc/init.d/pacemaker,
>>> /etc/sysconfig ), After that I was able to run "pcs
>>> cluster start".
>>> But it failed with the following error
>>> # pcs cluster start
>>> Starting Cluster...
>>> Starting Pacemaker Cluster Manager[FAILED]
>>> Error: unable to start pacemaker
>>>
>>> And in the /var/log/pacemaker.log, I saw these errors
>>> pacemakerd: info: mcp_read_config: cmap connection
>>> setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s
>>> Apr 29 08:53:47 [15863] node_cu pacemakerd: info:
>>> mcp_read_config: cmap connection setup failed:
>>> CS_ERR_TRY_AGAIN. Retrying in 5s
>>> Apr 29 08:53:52 [15863] node_cu pacemakerd: warning:
>>> mcp_read_config: Could not connect to Cluster
>>> Configuration Database API, error 6
>>> Apr 29 08:53:52 [15863] node_cu pacemakerd: notice:
>>> main: Could not obtain corosync config data, exiting
>>> Apr 29 08:53:52 [15863] node_cu pacemakerd: info:
>>> crm_xml_cleanup: Cleaning up memory from libxml2
>>>
>>>
>>> And in the /var/log/Debuglog, I saw these errors coming
>>> from corosync
>>> 20160429 085347.487050 <tel:085347.487050> airv_cu
>>> daemon.warn corosync[12857]: [QB ] Denied connection,
>>> is not ready (12857-15863-14)
>>> 20160429 085347.487067 <tel:085347.487067> airv_cu
>>> daemon.info <http://daemon.info> corosync[12857]: [QB
>>> ] Denied connection, is not ready (12857-15863-14)
>>>
>>>
>>> I browsed the code of libqb to find that it is failing in
>>>
>>>
>>> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>>
>>> Line 600 :
>>> handle_new_connection function
>>>
>>> Line 637:
>>> if (auth_result == 0 &&
>>> c->service->serv_fns.connection_accept) {
>>> res = c->service->serv_fns.connection_accept(c,
>>> c->euid, c->egid);
>>> }
>>> if (res != 0) {
>>> goto send_response;
>>> }
>>>
>>> Any hints on this issue would be really helpful for me to
>>> go ahead.
>>> Please let me know if any logs are required,
>>>
>>> Regards,
>>> Sriram
>>>
>>> On Thu, Apr 28, 2016 at 2:42 PM, Sriram
>>> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>> wrote:
>>>
>>> Thanks Ken and Emmanuel.
>>> Its a big endian machine. I will try with running "pcs
>>> cluster setup" and "pcs cluster start"
>>> Inside cluster.py, "service pacemaker start" and
>>> "service corosync start" are executed to bring up
>>> pacemaker and corosync.
>>> Those service scripts and the infrastructure needed to
>>> bring up the processes in the above said manner
>>> doesn't exist in my board.
>>> As it is a embedded board with the limited memory,
>>> full fledged linux is not installed.
>>> Just curious to know, what could be reason the
>>> pacemaker throws that error.
>>>
>>> /"cmap connection setup failed: CS_ERR_TRY_AGAIN.
>>> Retrying in 1s"
>>>
>>> /
>>> Thanks for response.
>>>
>>> Regards,
>>> Sriram.
>>>
>>> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot
>>> <kgaillot at redhat.com <mailto:kgaillot at redhat.com>>
>>> wrote:
>>>
>>> On 04/27/2016 11:25 AM, emmanuel segura wrote:
>>> > you need to use pcs to do everything, pcs
>>> cluster setup and pcs
>>> > cluster start, try to use the redhat docs for
>>> more information.
>>>
>>> Agreed -- pcs cluster setup will create a proper
>>> corosync.conf for you.
>>> Your corosync.conf below uses corosync 1 syntax,
>>> and there were
>>> significant changes in corosync 2. In particular,
>>> you don't need the
>>> file created in step 4, because pacemaker is no
>>> longer launched via a
>>> corosync plugin.
>>>
>>> > 2016-04-27 17:28 GMT+02:00 Sriram
>>> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>>:
>>> >> Dear All,
>>> >>
>>> >> I m trying to use pacemaker and corosync for
>>> the clustering requirement that
>>> >> came up recently.
>>> >> We have cross compiled corosync, pacemaker and
>>> pcs(python) for ppc
>>> >> environment (Target board where pacemaker and
>>> corosync are supposed to run)
>>> >> I m having trouble bringing up pacemaker in
>>> that environment, though I could
>>> >> successfully bring up corosync.
>>> >> Any help is welcome.
>>> >>
>>> >> I m using these versions of pacemaker and
>>> corosync
>>> >> [root at node_cu pacemaker]# corosync -v
>>> >> Corosync Cluster Engine, version '2.3.5'
>>> >> Copyright (c) 2006-2009 Red Hat, Inc.
>>> >> [root at node_cu pacemaker]# pacemakerd -$
>>> >> Pacemaker 1.1.14
>>> >> Written by Andrew Beekhof
>>> >>
>>> >> For running corosync, I did the following.
>>> >> 1. Created the following directories,
>>> >> /var/lib/pacemaker
>>> >> /var/lib/corosync
>>> >> /var/lib/pacemaker
>>> >> /var/lib/pacemaker/cores
>>> >> /var/lib/pacemaker/pengine
>>> >> /var/lib/pacemaker/blackbox
>>> >> /var/lib/pacemaker/cib
>>> >>
>>> >>
>>> >> 2. Created a file called corosync.conf under
>>> /etc/corosync folder with the
>>> >> following contents
>>> >>
>>> >> totem {
>>> >>
>>> >> version: 2
>>> >> token: 5000
>>> >> token_retransmits_before_loss_const: 20
>>> >> join: 1000
>>> >> consensus: 7500
>>> >> vsftype: none
>>> >> max_messages: 20
>>> >> secauth: off
>>> >> cluster_name: mycluster
>>> >> transport: udpu
>>> >> threads: 0
>>> >> clear_node_high_bit: yes
>>> >>
>>> >> interface {
>>> >> ringnumber: 0
>>> >> # The following three values
>>> need to be set based on your
>>> >> environment
>>> >> bindnetaddr: 10.x.x.x
>>> >> mcastaddr: 226.94.1.1
>>> >> mcastport: 5405
>>> >> }
>>> >> }
>>> >>
>>> >> logging {
>>> >> fileline: off
>>> >> to_syslog: yes
>>> >> to_stderr: no
>>> >> to_syslog: yes
>>> >> logfile: /var/log/corosync.log
>>> >> syslog_facility: daemon
>>> >> debug: on
>>> >> timestamp: on
>>> >> }
>>> >>
>>> >> amf {
>>> >> mode: disabled
>>> >> }
>>> >>
>>> >> quorum {
>>> >> provider: corosync_votequorum
>>> >> }
>>> >>
>>> >> nodelist {
>>> >> node {
>>> >> ring0_addr: node_cu
>>> >> nodeid: 1
>>> >> }
>>> >> }
>>> >>
>>> >> 3. Created authkey under /etc/corosync
>>> >>
>>> >> 4. Created a file called pcmk under
>>> /etc/corosync/service.d and contents as
>>> >> below,
>>> >> cat pcmk
>>> >> service {
>>> >> # Load the Pacemaker Cluster Resource
>>> Manager
>>> >> name: pacemaker
>>> >> ver: 1
>>> >> }
>>> >>
>>> >> 5. Added the node name "node_cu" in /etc/hosts
>>> with 10.X.X.X ip
>>> >>
>>> >> 6. ./corosync -f -p & --> this step started
>>> corosync
>>> >>
>>> >> [root at node_cu pacemaker]# netstat -alpn | grep
>>> -i coros
>>> >> udp 0 0 10.X.X.X:61841 0.0.0.0:*
>>> >> 9133/corosync
>>> >> udp 0 0 10.X.X.X:5405 0.0.0.0:*
>>> >> 9133/corosync
>>> >> unix 2 [ ACC ] STREAM LISTENING
>>> 148888 9133/corosync
>>> >> @quorum
>>> >> unix 2 [ ACC ] STREAM LISTENING
>>> 148884 9133/corosync
>>> >> @cmap
>>> >> unix 2 [ ACC ] STREAM LISTENING
>>> 148887 9133/corosync
>>> >> @votequorum
>>> >> unix 2 [ ACC ] STREAM LISTENING
>>> 148885 9133/corosync
>>> >> @cfg
>>> >> unix 2 [ ACC ] STREAM LISTENING
>>> 148886 9133/corosync
>>> >> @cpg
>>> >> unix 2 [ ] DGRAM
>>> 148840 9133/corosync
>>> >>
>>> >> 7. ./pacemakerd -f & gives the following error
>>> and exits.
>>> >> [root at node_cu pacemaker]# pacemakerd -f
>>> >> cmap connection setup failed:
>>> CS_ERR_TRY_AGAIN. Retrying in 1s
>>> >> cmap connection setup failed:
>>> CS_ERR_TRY_AGAIN. Retrying in 2s
>>> >> cmap connection setup failed:
>>> CS_ERR_TRY_AGAIN. Retrying in 3s
>>> >> cmap connection setup failed:
>>> CS_ERR_TRY_AGAIN. Retrying in 4s
>>> >> cmap connection setup failed:
>>> CS_ERR_TRY_AGAIN. Retrying in 5s
>>> >> Could not connect to Cluster Configuration
>>> Database API, error 6
>>> >>
>>> >> Can you please point me, what is missing in
>>> these steps ?
>>> >>
>>> >> Before trying these steps, I tried running "pcs
>>> cluster start", but that
>>> >> command fails with "service" script not found.
>>> As the root filesystem
>>> >> doesn't contain either /etc/init.d/ or
>>> /sbin/service
>>> >>
>>> >> So, the plan is to bring up corosync and
>>> pacemaker manually, later do the
>>> >> cluster configuration using "pcs" commands.
>>> >>
>>> >> Regards,
>>> >> Sriram
>>> >>
>>> >> _______________________________________________
>>> >> Users mailing list: Users at clusterlabs.org
>>> <mailto:Users at clusterlabs.org>
>>> >> http://clusterlabs.org/mailman/listinfo/users
>>> >>
>>> >> Project Home: http://www.clusterlabs.org
>>> >> Getting started:
>>>
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> Bugs: http://bugs.clusterlabs.org
>>> >>
>>> >
>>> >
>>> >
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> <mailto:Users at clusterlabs.org>
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
>>>
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> <mailto:Users at clusterlabs.org>
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160502/5be48cb0/attachment.htm>
More information about the Users
mailing list