[ClusterLabs] [ClusterLab] : Corosync not initializing successfully
Dejan Muhamedagic
dejanmm at fastmail.fm
Tue May 3 11:56:06 UTC 2016
Hi,
On Mon, May 02, 2016 at 08:54:09AM +0200, Jan Friesse wrote:
> >As your hardware is probably capable of running ppcle and if you have an
> >environment
> >at hand without too much effort it might pay off to try that.
> >There are of course distributions out there support corosync on
> >big-endian architectures
> >but I don't know if there is an automatized regression for corosync on
> >big-endian that
> >would catch big-endian-issues right away with something as current as
> >your 2.3.5.
>
> No we are not testing big-endian.
>
> So totally agree with Klaus. Give a try to ppcle. Also make sure all
> nodes are little-endian. Corosync should work in mixed BE/LE
> environment but because it's not tested, it may not work (and it's a
> bug, so if ppcle works I will try to fix BE).
I tested a cluster consisting of big endian/little endian nodes
(s390 and x86-64), but that was a while ago. IIRC, all relevant
bugs in corosync got fixed at that time. Don't know what is the
situation with the latest version.
Thanks,
Dejan
> Regards,
> Honza
>
> >
> >Regards,
> >Klaus
> >
> >On 05/02/2016 06:44 AM, Nikhil Utane wrote:
> >>Re-sending as I don't see my post on the thread.
> >>
> >>On Sun, May 1, 2016 at 4:21 PM, Nikhil Utane
> >><nikhil.subscribed at gmail.com <mailto:nikhil.subscribed at gmail.com>> wrote:
> >>
> >> Hi,
> >>
> >> Looking for some guidance here as we are completely blocked
> >> otherwise :(.
> >>
> >> -Regards
> >> Nikhil
> >>
> >> On Fri, Apr 29, 2016 at 6:11 PM, Sriram <sriram.ec at gmail.com
> >> <mailto:sriram.ec at gmail.com>> wrote:
> >>
> >> Corrected the subject.
> >>
> >> We went ahead and captured corosync debug logs for our ppc board.
> >> After log analysis and comparison with the sucessful logs(
> >> from x86 machine) ,
> >> we didnt find *"[ MAIN ] Completed service synchronization,
> >> ready to provide service.*" in ppc logs.
> >> So, looks like corosync is not in a position to accept
> >> connection from Pacemaker.
> >> Even I tried with the new corosync.conf with no success.
> >>
> >> Any hints on this issue would be really helpful.
> >>
> >> Attaching ppc_notworking.log, x86_working.log, corosync.conf.
> >>
> >> Regards,
> >> Sriram
> >>
> >>
> >>
> >> On Fri, Apr 29, 2016 at 2:44 PM, Sriram <sriram.ec at gmail.com
> >> <mailto:sriram.ec at gmail.com>> wrote:
> >>
> >> Hi,
> >>
> >> I went ahead and made some changes in file system(Like I
> >> brought in /etc/init.d/corosync and /etc/init.d/pacemaker,
> >> /etc/sysconfig ), After that I was able to run "pcs
> >> cluster start".
> >> But it failed with the following error
> >> # pcs cluster start
> >> Starting Cluster...
> >> Starting Pacemaker Cluster Manager[FAILED]
> >> Error: unable to start pacemaker
> >>
> >> And in the /var/log/pacemaker.log, I saw these errors
> >> pacemakerd: info: mcp_read_config: cmap connection
> >> setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s
> >> Apr 29 08:53:47 [15863] node_cu pacemakerd: info:
> >> mcp_read_config: cmap connection setup failed:
> >> CS_ERR_TRY_AGAIN. Retrying in 5s
> >> Apr 29 08:53:52 [15863] node_cu pacemakerd: warning:
> >> mcp_read_config: Could not connect to Cluster
> >> Configuration Database API, error 6
> >> Apr 29 08:53:52 [15863] node_cu pacemakerd: notice:
> >> main: Could not obtain corosync config data, exiting
> >> Apr 29 08:53:52 [15863] node_cu pacemakerd: info:
> >> crm_xml_cleanup: Cleaning up memory from libxml2
> >>
> >>
> >> And in the /var/log/Debuglog, I saw these errors coming
> >> from corosync
> >> 20160429 085347.487050 <tel:085347.487050> airv_cu
> >> daemon.warn corosync[12857]: [QB ] Denied connection,
> >> is not ready (12857-15863-14)
> >> 20160429 085347.487067 <tel:085347.487067> airv_cu
> >> daemon.info <http://daemon.info> corosync[12857]: [QB
> >> ] Denied connection, is not ready (12857-15863-14)
> >>
> >>
> >> I browsed the code of libqb to find that it is failing in
> >>
> >> https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c
> >>
> >> Line 600 :
> >> handle_new_connection function
> >>
> >> Line 637:
> >> if (auth_result == 0 &&
> >> c->service->serv_fns.connection_accept) {
> >> res = c->service->serv_fns.connection_accept(c,
> >> c->euid, c->egid);
> >> }
> >> if (res != 0) {
> >> goto send_response;
> >> }
> >>
> >> Any hints on this issue would be really helpful for me to
> >> go ahead.
> >> Please let me know if any logs are required,
> >>
> >> Regards,
> >> Sriram
> >>
> >> On Thu, Apr 28, 2016 at 2:42 PM, Sriram
> >> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>> wrote:
> >>
> >> Thanks Ken and Emmanuel.
> >> Its a big endian machine. I will try with running "pcs
> >> cluster setup" and "pcs cluster start"
> >> Inside cluster.py, "service pacemaker start" and
> >> "service corosync start" are executed to bring up
> >> pacemaker and corosync.
> >> Those service scripts and the infrastructure needed to
> >> bring up the processes in the above said manner
> >> doesn't exist in my board.
> >> As it is a embedded board with the limited memory,
> >> full fledged linux is not installed.
> >> Just curious to know, what could be reason the
> >> pacemaker throws that error.
> >>
> >> /"cmap connection setup failed: CS_ERR_TRY_AGAIN.
> >> Retrying in 1s"
> >>
> >> /
> >> Thanks for response.
> >>
> >> Regards,
> >> Sriram.
> >>
> >> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot
> >> <kgaillot at redhat.com <mailto:kgaillot at redhat.com>> wrote:
> >>
> >> On 04/27/2016 11:25 AM, emmanuel segura wrote:
> >> > you need to use pcs to do everything, pcs
> >> cluster setup and pcs
> >> > cluster start, try to use the redhat docs for
> >> more information.
> >>
> >> Agreed -- pcs cluster setup will create a proper
> >> corosync.conf for you.
> >> Your corosync.conf below uses corosync 1 syntax,
> >> and there were
> >> significant changes in corosync 2. In particular,
> >> you don't need the
> >> file created in step 4, because pacemaker is no
> >> longer launched via a
> >> corosync plugin.
> >>
> >> > 2016-04-27 17:28 GMT+02:00 Sriram
> >> <sriram.ec at gmail.com <mailto:sriram.ec at gmail.com>>:
> >> >> Dear All,
> >> >>
> >> >> I m trying to use pacemaker and corosync for
> >> the clustering requirement that
> >> >> came up recently.
> >> >> We have cross compiled corosync, pacemaker and
> >> pcs(python) for ppc
> >> >> environment (Target board where pacemaker and
> >> corosync are supposed to run)
> >> >> I m having trouble bringing up pacemaker in
> >> that environment, though I could
> >> >> successfully bring up corosync.
> >> >> Any help is welcome.
> >> >>
> >> >> I m using these versions of pacemaker and corosync
> >> >> [root at node_cu pacemaker]# corosync -v
> >> >> Corosync Cluster Engine, version '2.3.5'
> >> >> Copyright (c) 2006-2009 Red Hat, Inc.
> >> >> [root at node_cu pacemaker]# pacemakerd -$
> >> >> Pacemaker 1.1.14
> >> >> Written by Andrew Beekhof
> >> >>
> >> >> For running corosync, I did the following.
> >> >> 1. Created the following directories,
> >> >> /var/lib/pacemaker
> >> >> /var/lib/corosync
> >> >> /var/lib/pacemaker
> >> >> /var/lib/pacemaker/cores
> >> >> /var/lib/pacemaker/pengine
> >> >> /var/lib/pacemaker/blackbox
> >> >> /var/lib/pacemaker/cib
> >> >>
> >> >>
> >> >> 2. Created a file called corosync.conf under
> >> /etc/corosync folder with the
> >> >> following contents
> >> >>
> >> >> totem {
> >> >>
> >> >> version: 2
> >> >> token: 5000
> >> >> token_retransmits_before_loss_const: 20
> >> >> join: 1000
> >> >> consensus: 7500
> >> >> vsftype: none
> >> >> max_messages: 20
> >> >> secauth: off
> >> >> cluster_name: mycluster
> >> >> transport: udpu
> >> >> threads: 0
> >> >> clear_node_high_bit: yes
> >> >>
> >> >> interface {
> >> >> ringnumber: 0
> >> >> # The following three values
> >> need to be set based on your
> >> >> environment
> >> >> bindnetaddr: 10.x.x.x
> >> >> mcastaddr: 226.94.1.1
> >> >> mcastport: 5405
> >> >> }
> >> >> }
> >> >>
> >> >> logging {
> >> >> fileline: off
> >> >> to_syslog: yes
> >> >> to_stderr: no
> >> >> to_syslog: yes
> >> >> logfile: /var/log/corosync.log
> >> >> syslog_facility: daemon
> >> >> debug: on
> >> >> timestamp: on
> >> >> }
> >> >>
> >> >> amf {
> >> >> mode: disabled
> >> >> }
> >> >>
> >> >> quorum {
> >> >> provider: corosync_votequorum
> >> >> }
> >> >>
> >> >> nodelist {
> >> >> node {
> >> >> ring0_addr: node_cu
> >> >> nodeid: 1
> >> >> }
> >> >> }
> >> >>
> >> >> 3. Created authkey under /etc/corosync
> >> >>
> >> >> 4. Created a file called pcmk under
> >> /etc/corosync/service.d and contents as
> >> >> below,
> >> >> cat pcmk
> >> >> service {
> >> >> # Load the Pacemaker Cluster Resource
> >> Manager
> >> >> name: pacemaker
> >> >> ver: 1
> >> >> }
> >> >>
> >> >> 5. Added the node name "node_cu" in /etc/hosts
> >> with 10.X.X.X ip
> >> >>
> >> >> 6. ./corosync -f -p & --> this step started
> >> corosync
> >> >>
> >> >> [root at node_cu pacemaker]# netstat -alpn | grep
> >> -i coros
> >> >> udp 0 0 10.X.X.X:61841 0.0.0.0:*
> >> >> 9133/corosync
> >> >> udp 0 0 10.X.X.X:5405 0.0.0.0:*
> >> >> 9133/corosync
> >> >> unix 2 [ ACC ] STREAM LISTENING
> >> 148888 9133/corosync
> >> >> @quorum
> >> >> unix 2 [ ACC ] STREAM LISTENING
> >> 148884 9133/corosync
> >> >> @cmap
> >> >> unix 2 [ ACC ] STREAM LISTENING
> >> 148887 9133/corosync
> >> >> @votequorum
> >> >> unix 2 [ ACC ] STREAM LISTENING
> >> 148885 9133/corosync
> >> >> @cfg
> >> >> unix 2 [ ACC ] STREAM LISTENING
> >> 148886 9133/corosync
> >> >> @cpg
> >> >> unix 2 [ ] DGRAM
> >> 148840 9133/corosync
> >> >>
> >> >> 7. ./pacemakerd -f & gives the following error
> >> and exits.
> >> >> [root at node_cu pacemaker]# pacemakerd -f
> >> >> cmap connection setup failed:
> >> CS_ERR_TRY_AGAIN. Retrying in 1s
> >> >> cmap connection setup failed:
> >> CS_ERR_TRY_AGAIN. Retrying in 2s
> >> >> cmap connection setup failed:
> >> CS_ERR_TRY_AGAIN. Retrying in 3s
> >> >> cmap connection setup failed:
> >> CS_ERR_TRY_AGAIN. Retrying in 4s
> >> >> cmap connection setup failed:
> >> CS_ERR_TRY_AGAIN. Retrying in 5s
> >> >> Could not connect to Cluster Configuration
> >> Database API, error 6
> >> >>
> >> >> Can you please point me, what is missing in
> >> these steps ?
> >> >>
> >> >> Before trying these steps, I tried running "pcs
> >> cluster start", but that
> >> >> command fails with "service" script not found.
> >> As the root filesystem
> >> >> doesn't contain either /etc/init.d/ or
> >> /sbin/service
> >> >>
> >> >> So, the plan is to bring up corosync and
> >> pacemaker manually, later do the
> >> >> cluster configuration using "pcs" commands.
> >> >>
> >> >> Regards,
> >> >> Sriram
> >> >>
> >> >> _______________________________________________
> >> >> Users mailing list: Users at clusterlabs.org
> >> <mailto:Users at clusterlabs.org>
> >> >> http://clusterlabs.org/mailman/listinfo/users
> >> >>
> >> >> Project Home: http://www.clusterlabs.org
> >> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> >> Bugs: http://bugs.clusterlabs.org
> >> >>
> >> >
> >> >
> >> >
> >>
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> >> <mailto:Users at clusterlabs.org>
> >> http://clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> >> <mailto:Users at clusterlabs.org>
> >> http://clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >>
> >>
> >>
> >>_______________________________________________
> >>Users mailing list: Users at clusterlabs.org
> >>http://clusterlabs.org/mailman/listinfo/users
> >>
> >>Project Home: http://www.clusterlabs.org
> >>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>Bugs: http://bugs.clusterlabs.org
> >
> >
> >_______________________________________________
> >Users mailing list: Users at clusterlabs.org
> >http://clusterlabs.org/mailman/listinfo/users
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list