[ClusterLabs] [ClusterLab] : Corosync not initializing successfully
Jan Friesse
jfriesse at redhat.com
Mon May 2 06:54:09 UTC 2016
> As your hardware is probably capable of running ppcle and if you have an
> environment
> at hand without too much effort it might pay off to try that.
> There are of course distributions out there support corosync on
> big-endian architectures
> but I don't know if there is an automatized regression for corosync on
> big-endian that
> would catch big-endian-issues right away with something as current as
> your 2.3.5.
No we are not testing big-endian.
So totally agree with Klaus. Give a try to ppcle. Also make sure all
nodes are little-endian. Corosync should work in mixed BE/LE environment
but because it's not tested, it may not work (and it's a bug, so if
ppcle works I will try to fix BE).
Regards,
Honza
>
> Regards,
> Klaus
>
> On 05/02/2016 06:44 AM, Nikhil Utane wrote:
>> Re-sending as I don't see my post on the thread.
>>
>> On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane
>> <nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>> wrote:
>>
>> Hi,
>>
>> Looking for some guidance here as we are completely blocked
>> otherwise :(.
>>
>> -Regards
>> Nikhil
>>
>> On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com
>> <mailto:sriram.ec at gmail.com>> wrote:
>>
>> Corrected the subject.
>>
>> We went ahead and captured corosync debug logs for our ppc board.
>> After log analysis and comparison with the sucessful logs(
>> from x86 machine) ,
>> we didnt find *"[ MAIN ] Completed service synchronization,
>> ready to provide service.*" in ppc logs.
>> So, looks like corosync is not in a position to accept
>> connection from Pacemaker.
>> Even I tried with the new corosync.conf with no success.
>>
>> Any hints on this issue would be really helpful.
>>
>> Attaching ppc_notworking.log, x86_working.log, corosync.conf.
>>
>> Regards,
>> Sriram
>>
>>
>>
>> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com
>> <mailto:sriram.ec at gmail.com>> wrote:
>>
>> Hi,
>>
>> I went ahead and made some changes in file system(Like I
>> brought in /etc/init.d/corosync and /etc/init.d/pacemaker,
>> /etc/sysconfig ), After that I was able to run "pcs
>> cluster start".
>> But it failed with the following error
>> # pcs cluster start
>> Starting Cluster...
>> Starting Pacemaker Cluster Manager[FAILED]
>> Error: unable to start pacemaker
>>
>> And in the /var/log/pacemaker.log, I saw these errors
>> pacemakerd: info: mcp_read_config: cmap connection
>> setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s
>> Apr 29 08:53:47 [15863] node_cu pacemakerd: info:
>> mcp_read_config: cmap connection setup failed:
>> CS_ERR_TRY_AGAIN. Retrying in 5s
>> Apr 29 08:53:52 [15863] node_cu pacemakerd: warning:
>> mcp_read_config: Could not connect to Cluster
>> Configuration Database API, error 6
>> Apr 29 08:53:52 [15863] node_cu pacemakerd: notice:
>> main: Could not obtain corosync config data, exiting
>> Apr 29 08:53:52 [15863] node_cu pacemakerd: info:
>> crm_xml_cleanup: Cleaning up memory from libxml2
>>
>>
>> And in the /var/log/Debuglog, I saw these errors coming
>> from corosync
>> 20160429 085347.487050 <tel:085347.487050> airv_cu
>> daemon.warn corosync[12857]: [QB ] Denied connection,
>> is not ready (12857-15863-14)
>> 20160429 085347.487067 <tel:085347.487067> airv_cu
>> daemon.info <http://daemon.info> corosync[12857]: [QB
>> ] Denied connection, is not ready (12857-15863-14)
>>
>>
>> I browsed the code of libqb to find that it is failing in
>>
>> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
>>
>> Line 600 :
>> handle_new_connection function
>>
>> Line 637:
>> if (auth_result == 0 &&
>> c->service->serv_fns.connection_accept) {
>> res = c->service->serv_fns.connection_accept(c,
>> c->euid, c->egid);
>> }
>> if (res != 0) {
>> goto send_response;
>> }
>>
>> Any hints on this issue would be really helpful for me to
>> go ahead.
>> Please let me know if any logs are required,
>>
>> Regards,
>> Sriram
>>
>> On Thu, Apr 28, 2016 at 2:42 PM, Sriram
>> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>> wrote:
>>
>> Thanks Ken and Emmanuel.
>> Its a big endian machine. I will try with running "pcs
>> cluster setup" and "pcs cluster start"
>> Inside cluster.py, "service pacemaker start" and
>> "service corosync start" are executed to bring up
>> pacemaker and corosync.
>> Those service scripts and the infrastructure needed to
>> bring up the processes in the above said manner
>> doesn't exist in my board.
>> As it is a embedded board with the limited memory,
>> full fledged linux is not installed.
>> Just curious to know, what could be reason the
>> pacemaker throws that error.
>>
>> /"cmap connection setup failed: CS_ERR_TRY_AGAIN.
>> Retrying in 1s"
>>
>> /
>> Thanks for response.
>>
>> Regards,
>> Sriram.
>>
>> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot
>> <kgaillot at redhat.com <mailto:kgaillot at redhat.com>> wrote:
>>
>> On 04/27/2016 11:25 AM, emmanuel segura wrote:
>> > you need to use pcs to do everything, pcs
>> cluster setup and pcs
>> > cluster start, try to use the redhat docs for
>> more information.
>>
>> Agreed -- pcs cluster setup will create a proper
>> corosync.conf for you.
>> Your corosync.conf below uses corosync 1 syntax,
>> and there were
>> significant changes in corosync 2. In particular,
>> you don't need the
>> file created in step 4, because pacemaker is no
>> longer launched via a
>> corosync plugin.
>>
>> > 2016-04-27 17:28 GMT+02:00 Sriram
>> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>>:
>> >> Dear All,
>> >>
>> >> I m trying to use pacemaker and corosync for
>> the clustering requirement that
>> >> came up recently.
>> >> We have cross compiled corosync, pacemaker and
>> pcs(python) for ppc
>> >> environment (Target board where pacemaker and
>> corosync are supposed to run)
>> >> I m having trouble bringing up pacemaker in
>> that environment, though I could
>> >> successfully bring up corosync.
>> >> Any help is welcome.
>> >>
>> >> I m using these versions of pacemaker and corosync
>> >> [root at node_cu pacemaker]# corosync -v
>> >> Corosync Cluster Engine, version '2.3.5'
>> >> Copyright (c) 2006-2009 Red Hat, Inc.
>> >> [root at node_cu pacemaker]# pacemakerd -$
>> >> Pacemaker 1.1.14
>> >> Written by Andrew Beekhof
>> >>
>> >> For running corosync, I did the following.
>> >> 1. Created the following directories,
>> >> /var/lib/pacemaker
>> >> /var/lib/corosync
>> >> /var/lib/pacemaker
>> >> /var/lib/pacemaker/cores
>> >> /var/lib/pacemaker/pengine
>> >> /var/lib/pacemaker/blackbox
>> >> /var/lib/pacemaker/cib
>> >>
>> >>
>> >> 2. Created a file called corosync.conf under
>> /etc/corosync folder with the
>> >> following contents
>> >>
>> >> totem {
>> >>
>> >> version: 2
>> >> token: 5000
>> >> token_retransmits_before_loss_const: 20
>> >> join: 1000
>> >> consensus: 7500
>> >> vsftype: none
>> >> max_messages: 20
>> >> secauth: off
>> >> cluster_name: mycluster
>> >> transport: udpu
>> >> threads: 0
>> >> clear_node_high_bit: yes
>> >>
>> >> interface {
>> >> ringnumber: 0
>> >> # The following three values
>> need to be set based on your
>> >> environment
>> >> bindnetaddr: 10.x.x.x
>> >> mcastaddr: 226.94.1.1
>> >> mcastport: 5405
>> >> }
>> >> }
>> >>
>> >> logging {
>> >> fileline: off
>> >> to_syslog: yes
>> >> to_stderr: no
>> >> to_syslog: yes
>> >> logfile: /var/log/corosync.log
>> >> syslog_facility: daemon
>> >> debug: on
>> >> timestamp: on
>> >> }
>> >>
>> >> amf {
>> >> mode: disabled
>> >> }
>> >>
>> >> quorum {
>> >> provider: corosync_votequorum
>> >> }
>> >>
>> >> nodelist {
>> >> node {
>> >> ring0_addr: node_cu
>> >> nodeid: 1
>> >> }
>> >> }
>> >>
>> >> 3. Created authkey under /etc/corosync
>> >>
>> >> 4. Created a file called pcmk under
>> /etc/corosync/service.d and contents as
>> >> below,
>> >> cat pcmk
>> >> service {
>> >> # Load the Pacemaker Cluster Resource
>> Manager
>> >> name: pacemaker
>> >> ver: 1
>> >> }
>> >>
>> >> 5. Added the node name "node_cu" in /etc/hosts
>> with 10.X.X.X ip
>> >>
>> >> 6. ./corosync -f -p & --> this step started
>> corosync
>> >>
>> >> [root at node_cu pacemaker]# netstat -alpn | grep
>> -i coros
>> >> udp 0 0 10.X.X.X:61841 0.0.0.0:*
>> >> 9133/corosync
>> >> udp 0 0 10.X.X.X:5405 0.0.0.0:*
>> >> 9133/corosync
>> >> unix 2 [ ACC ] STREAM LISTENING
>> 148888 9133/corosync
>> >> @quorum
>> >> unix 2 [ ACC ] STREAM LISTENING
>> 148884 9133/corosync
>> >> @cmap
>> >> unix 2 [ ACC ] STREAM LISTENING
>> 148887 9133/corosync
>> >> @votequorum
>> >> unix 2 [ ACC ] STREAM LISTENING
>> 148885 9133/corosync
>> >> @cfg
>> >> unix 2 [ ACC ] STREAM LISTENING
>> 148886 9133/corosync
>> >> @cpg
>> >> unix 2 [ ] DGRAM
>> 148840 9133/corosync
>> >>
>> >> 7. ./pacemakerd -f & gives the following error
>> and exits.
>> >> [root at node_cu pacemaker]# pacemakerd -f
>> >> cmap connection setup failed:
>> CS_ERR_TRY_AGAIN. Retrying in 1s
>> >> cmap connection setup failed:
>> CS_ERR_TRY_AGAIN. Retrying in 2s
>> >> cmap connection setup failed:
>> CS_ERR_TRY_AGAIN. Retrying in 3s
>> >> cmap connection setup failed:
>> CS_ERR_TRY_AGAIN. Retrying in 4s
>> >> cmap connection setup failed:
>> CS_ERR_TRY_AGAIN. Retrying in 5s
>> >> Could not connect to Cluster Configuration
>> Database API, error 6
>> >>
>> >> Can you please point me, what is missing in
>> these steps ?
>> >>
>> >> Before trying these steps, I tried running "pcs
>> cluster start", but that
>> >> command fails with "service" script not found.
>> As the root filesystem
>> >> doesn't contain either /etc/init.d/ or
>> /sbin/service
>> >>
>> >> So, the plan is to bring up corosync and
>> pacemaker manually, later do the
>> >> cluster configuration using "pcs" commands.
>> >>
>> >> Regards,
>> >> Sriram
>> >>
>> >> _______________________________________________
>> >> Users mailing list: Users at clusterlabs.org
>> <mailto:Users at clusterlabs.org>
>> >> http://clusterlabs.org/mailman/listinfo/users
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: http://bugs.clusterlabs.org
>> >>
>> >
>> >
>> >
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> <mailto:Users at clusterlabs.org>
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> <mailto:Users at clusterlabs.org>
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list