[ClusterLabs] Unable to run Pacemaker: pcmk_child_exit
Nikhil Utane
nikhil.subscribed at gmail.com
Fri May 6 11:29:02 UTC 2016
The command failed.
[root at airv_cu pacemaker]# cibadmin --upgrade --force
Call cib_upgrade failed (-62): Timer expired
I did not do any tooling. (Not even aware how to)
As I mentioned, I am cross-compiling and copying the relevant files on
target platform.
In one of the earlier run pacemaker cribbed out not finding
/usr/share/pacemaker/pacemaker-1.0.rng.
I found this file under xml folder in the build folder, so I copied all the
files under xml folder onto the target.
Did that screw it up?
This is the content of the folder:
[root at airv_cu pacemaker]# ls /usr/share/pacemaker/
Makefile constraints-2.1.rng nodes-1.0.rng
pacemaker-2.1.rng rule.rng
Makefile.am constraints-2.2.rng nodes-1.2.rng
pacemaker-2.2.rng score.rng
Makefile.in constraints-2.3.rng nodes-1.3.rng
pacemaker-2.3.rng status-1.0.rng
Readme.md constraints-next.rng nvset-1.3.rng
pacemaker-2.4.rng tags-1.3.rng
acls-1.2.rng context-of.xsl nvset.rng
pacemaker-next.rng upgrade-1.3.xsl
acls-2.0.rng crm-transitional.dtd ocf-meta2man.xsl
pacemaker.rng upgrade06.xsl
best-match.sh crm.dtd options-1.0.rng
regression.core.sh versions.rng
cib-1.0.rng crm.xsl pacemaker-1.0.rng
regression.sh
cib-1.2.rng crm_mon.rng pacemaker-1.2.rng
resources-1.0.rng
constraints-1.0.rng fencing-1.2.rng pacemaker-1.3.rng
resources-1.2.rng
constraints-1.2.rng fencing-2.4.rng pacemaker-2.0.rng
resources-1.3.rng
-Regards
Nikhil
On Fri, May 6, 2016 at 4:41 PM, Klaus Wenninger <kwenning at redhat.com> wrote:
> On 05/06/2016 12:40 PM, Nikhil Utane wrote:
> > Hi,
> >
> > I used the blackbox feature which showed the reason for failure.
> > As I am cross-compiling pacemaker on a build machine and later moving
> > the binaries to the target, few binaries were missing. After fixing
> > that and bunch of other errors/warning, I am able to get pacemaker
> > started though not completely running fine.
> >
> > The node is not getting added:
> > airv_cu cib: error: xml_log:Element node failed to validate
> > attributes
> >
> > I suppose it is because of this error:
> > crmd: error: node_list_update_callback:Node update 4 failed: Update
> > does not conform to the configured schema (-203)
> >
> > I am suspecting this is caused because of
> > validate-with="pacemaker-0.7" in the cib. In another installation this
> > is being set to '"pacemaker-2.0"'
> >
> > [root at airv_cu pacemaker]# pcs cluster cib
> > <cib crm_feature_set="3.0.10" validate-with="pacemaker-0.7" epoch="3"
> > num_updates="0" admin_epoch="0" cib-last-written="Fri May 6 09:28:10
> > 2016" have-quorum="1">
> > <configuration>
> > <crm_config>
> > <cluster_property_set id="cib-bootstrap-options">
> > <nvpair id="cib-bootstrap-options-have-watchdog"
> > name="have-watchdog" value="true"/>
> > <nvpair id="cib-bootstrap-options-dc-version"
> > name="dc-version" value="1.1.14-5a6cdd1"/>
> > <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> > name="cluster-infrastructure" value="corosync"/>
> > </cluster_property_set>
> > </crm_config>
> > <nodes/>
> > <resources/>
> > <constraints/>
> > </configuration>
> > <status/>
> > </cib>
> >
> > Any idea why/where this is being set to 0.7. I am using latest
> > pacemaker from GitHub.
>
> What kind of tooling did you use to create the cib?
> Try 'cibadmin --upgrade'. That should set the cib-version to what your
> pacemaker-version supports.
>
> >
> > [root at airv_cu pacemaker]# pacemakerd --version
> > Pacemaker 1.1.14
> > Written by Andrew Beekhof
> >
> > Attaching the corosync.log and corosync.conf file.
> >
> > -Thanks
> > Nikhil
> >
> >
> > On Thu, May 5, 2016 at 10:21 PM, Ken Gaillot <kgaillot at redhat.com
> > <mailto:kgaillot at redhat.com>> wrote:
> >
> > On 05/05/2016 11:25 AM, Nikhil Utane wrote:
> > > Thanks Ken for your quick response as always.
> > >
> > > But what if I don't want to use quorum? I just want to bring up
> > > pacemaker + corosync on 1 node to check that it all comes up fine.
> > > I added corosync_votequorum as you suggested. Additionally I
> > also added
> > > these 2 lines:
> > >
> > > expected_votes: 2
> > > two_node: 1
> >
> > There's actually nothing wrong with configuring a single-node
> cluster.
> > You can list just one node in corosync.conf and leave off the above.
> >
> > > However still pacemaker is not able to run.
> >
> > There must be other issues involved. Even if pacemaker doesn't have
> > quorum, it will still run, it just won't start resources.
> >
> > > [root at airv_cu root]# pcs cluster start
> > > Starting Cluster...
> > > Starting Pacemaker Cluster Manager[FAILED]
> > >
> > > Error: unable to start pacemaker
> > >
> > > Corosync.log:
> > > *May 05 16:15:20 [16294] airv_cu pacemakerd: info:
> > > pcmk_quorum_notification: Membership 240: quorum still lost (1)*
> > > May 05 16:15:20 [16259] airv_cu corosync debug [QB ] Free'ing
> > > ringbuffer: /dev/shm/qb-cmap-request-16259-16294-21-header
> > > May 05 16:15:20 [16294] airv_cu pacemakerd: notice:
> > > crm_update_peer_state_iter: pcmk_quorum_notification: Node
> > > airv_cu[181344357] - state is now member (was (null))
> > > May 05 16:15:20 [16294] airv_cu pacemakerd: info:
> > > pcmk_cpg_membership: Node 181344357 joined group pacemakerd
> > > (counter=0.0)
> > > May 05 16:15:20 [16294] airv_cu pacemakerd: info:
> > > pcmk_cpg_membership: Node 181344357 still member of group
> > > pacemakerd (peer=airv_cu, counter=0.0)
> > > May 05 16:15:20 [16294] airv_cu pacemakerd: warning:
> > pcmk_child_exit:
> > > The cib process (16353) can no longer be respawned, shutting the
> > > cluster down.
> > > May 05 16:15:20 [16294] airv_cu pacemakerd: notice:
> > > pcmk_shutdown_worker: Shutting down Pacemaker
> > >
> > > The log and conf file is attached.
> > >
> > > -Regards
> > > Nikhil
> > >
> > > On Thu, May 5, 2016 at 8:04 PM, Ken Gaillot <kgaillot at redhat.com
> > <mailto:kgaillot at redhat.com>
> > > <mailto:kgaillot at redhat.com <mailto:kgaillot at redhat.com>>> wrote:
> > >
> > > On 05/05/2016 08:36 AM, Nikhil Utane wrote:
> > > > Hi,
> > > >
> > > > Continuing with my adventure to run Pacemaker & Corosync
> > on our
> > > > big-endian system, I managed to get past the corosync
> > issue for now. But
> > > > facing an issue in running Pacemaker.
> > > >
> > > > Seeing following messages in corosync.log.
> > > > pacemakerd: warning: pcmk_child_exit: The cib process
> > (20000) can no
> > > > longer be respawned, shutting the cluster down.
> > > > pacemakerd: warning: pcmk_child_exit: The stonith-ng
> > process (20001)
> > > > can no longer be respawned, shutting the cluster down.
> > > > pacemakerd: warning: pcmk_child_exit: The lrmd process
> > (20002) can no
> > > > longer be respawned, shutting the cluster down.
> > > > pacemakerd: warning: pcmk_child_exit: The attrd process
> > (20003) can
> > > > no longer be respawned, shutting the cluster down.
> > > > pacemakerd: warning: pcmk_child_exit: The pengine
> > process (20004) can
> > > > no longer be respawned, shutting the cluster down.
> > > > pacemakerd: warning: pcmk_child_exit: The crmd process
> > (20005) can no
> > > > longer be respawned, shutting the cluster down.
> > > >
> > > > I see following error before these messages. Not sure if
> > this is the cause.
> > > > May 05 11:26:24 [19998] airv_cu pacemakerd: error:
> > > > cluster_connect_quorum: Corosync quorum is not configured
> > > >
> > > > I tried removing the quorum block (which is anyways blank)
> > from the conf
> > > > file but still had the same error.
> > >
> > > Yes, that is the issue. Pacemaker can't do anything if it
> > can't ask
> > > corosync about quorum. I don't know what the issue is at the
> > corosync
> > > level, but your corosync.conf should have:
> > >
> > > quorum {
> > > provider: corosync_votequorum
> > > }
> > >
> > >
> > > > Attaching the log and conf files. Please let me know if
> > there is any
> > > > obvious mistake or how to investigate it further.
> > > >
> > > > I am using pcs cluster start command to start the cluster
> > > >
> > > > -Thanks
> > > > Nikhil
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160506/6907161c/attachment.htm>
More information about the Users
mailing list