[ClusterLabs] Unable to run Pacemaker: pcmk_child_exit

Nikhil Utane nikhil.subscribed at gmail.com
Fri May 6 07:40:21 EDT 2016


I suppose the failure is because I do not have a DC yet.

[root at airv_cu xml]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: NONE

Can I bring it up when I have just 1 node?

On Fri, May 6, 2016 at 4:59 PM, Nikhil Utane <nikhil.subscribed at gmail.com>
wrote:

> The command failed.
> [root at airv_cu pacemaker]# cibadmin --upgrade --force
> Call cib_upgrade failed (-62): Timer expired
>
> I did not do any tooling. (Not even aware how to)
>
> As I mentioned, I am cross-compiling and copying the relevant files on
> target platform.
> In one of the earlier run pacemaker cribbed out not finding
> /usr/share/pacemaker/pacemaker-1.0.rng.
>
> I found this file under xml folder in the build folder, so I copied all
> the files under xml folder onto the target.
> Did that screw it up?
>
> This is the content of the folder:
> [root at airv_cu pacemaker]# ls /usr/share/pacemaker/
> Makefile              constraints-2.1.rng   nodes-1.0.rng
> pacemaker-2.1.rng     rule.rng
> Makefile.am           constraints-2.2.rng   nodes-1.2.rng
> pacemaker-2.2.rng     score.rng
> Makefile.in           constraints-2.3.rng   nodes-1.3.rng
> pacemaker-2.3.rng     status-1.0.rng
> Readme.md             constraints-next.rng  nvset-1.3.rng
> pacemaker-2.4.rng     tags-1.3.rng
> acls-1.2.rng          context-of.xsl        nvset.rng
> pacemaker-next.rng    upgrade-1.3.xsl
> acls-2.0.rng          crm-transitional.dtd  ocf-meta2man.xsl
>  pacemaker.rng         upgrade06.xsl
> best-match.sh         crm.dtd               options-1.0.rng
> regression.core.sh    versions.rng
> cib-1.0.rng           crm.xsl               pacemaker-1.0.rng
> regression.sh
> cib-1.2.rng           crm_mon.rng           pacemaker-1.2.rng
> resources-1.0.rng
> constraints-1.0.rng   fencing-1.2.rng       pacemaker-1.3.rng
> resources-1.2.rng
> constraints-1.2.rng   fencing-2.4.rng       pacemaker-2.0.rng
> resources-1.3.rng
>
> -Regards
> Nikhil
>
> On Fri, May 6, 2016 at 4:41 PM, Klaus Wenninger <kwenning at redhat.com>
> wrote:
>
>> On 05/06/2016 12:40 PM, Nikhil Utane wrote:
>> > Hi,
>> >
>> > I used the blackbox feature which showed the reason for failure.
>> > As I am cross-compiling pacemaker on a build machine and later moving
>> > the binaries to the target, few binaries were missing. After fixing
>> > that and bunch of other errors/warning, I am able to get pacemaker
>> > started though not completely running fine.
>> >
>> > The node is not getting added:
>> > airv_cu        cib:    error: xml_log:Element node failed to validate
>> > attributes
>> >
>> > I suppose it is because of this error:
>> > crmd:    error: node_list_update_callback:Node update 4 failed: Update
>> > does not conform to the configured schema (-203)
>> >
>> > I am suspecting this is caused because of
>> > validate-with="pacemaker-0.7" in the cib. In another installation this
>> > is being set to '"pacemaker-2.0"'
>> >
>> > [root at airv_cu pacemaker]# pcs cluster cib
>> > <cib crm_feature_set="3.0.10" validate-with="pacemaker-0.7" epoch="3"
>> > num_updates="0" admin_epoch="0" cib-last-written="Fri May  6 09:28:10
>> > 2016" have-quorum="1">
>> >   <configuration>
>> >     <crm_config>
>> >       <cluster_property_set id="cib-bootstrap-options">
>> >         <nvpair id="cib-bootstrap-options-have-watchdog"
>> > name="have-watchdog" value="true"/>
>> >         <nvpair id="cib-bootstrap-options-dc-version"
>> > name="dc-version" value="1.1.14-5a6cdd1"/>
>> >         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>> > name="cluster-infrastructure" value="corosync"/>
>> >       </cluster_property_set>
>> >     </crm_config>
>> >     <nodes/>
>> >     <resources/>
>> >     <constraints/>
>> >   </configuration>
>> >   <status/>
>> > </cib>
>> >
>> > Any idea why/where this is being set to 0.7. I am using latest
>> > pacemaker from GitHub.
>>
>> What kind of tooling did you use to create the cib?
>> Try 'cibadmin --upgrade'. That should set the cib-version to what your
>> pacemaker-version supports.
>>
>> >
>> > [root at airv_cu pacemaker]# pacemakerd --version
>> > Pacemaker 1.1.14
>> > Written by Andrew Beekhof
>> >
>> > Attaching the corosync.log and corosync.conf file.
>> >
>> > -Thanks
>> > Nikhil
>> >
>> >
>> > On Thu, May 5, 2016 at 10:21 PM, Ken Gaillot <kgaillot at redhat.com
>> > <mailto:kgaillot at redhat.com>> wrote:
>> >
>> >     On 05/05/2016 11:25 AM, Nikhil Utane wrote:
>> >     > Thanks Ken for your quick response as always.
>> >     >
>> >     > But what if I don't want to use quorum? I just want to bring up
>> >     > pacemaker + corosync on 1 node to check that it all comes up fine.
>> >     > I added corosync_votequorum as you suggested. Additionally I
>> >     also added
>> >     > these 2 lines:
>> >     >
>> >     > expected_votes: 2
>> >     > two_node: 1
>> >
>> >     There's actually nothing wrong with configuring a single-node
>> cluster.
>> >     You can list just one node in corosync.conf and leave off the above.
>> >
>> >     > However still pacemaker is not able to run.
>> >
>> >     There must be other issues involved. Even if pacemaker doesn't have
>> >     quorum, it will still run, it just won't start resources.
>> >
>> >     > [root at airv_cu root]# pcs cluster start
>> >     > Starting Cluster...
>> >     > Starting Pacemaker Cluster Manager[FAILED]
>> >     >
>> >     > Error: unable to start pacemaker
>> >     >
>> >     > Corosync.log:
>> >     > *May 05 16:15:20 [16294] airv_cu pacemakerd:     info:
>> >     > pcmk_quorum_notification: Membership 240: quorum still lost (1)*
>> >     > May 05 16:15:20 [16259] airv_cu corosync debug   [QB    ] Free'ing
>> >     > ringbuffer: /dev/shm/qb-cmap-request-16259-16294-21-header
>> >     > May 05 16:15:20 [16294] airv_cu pacemakerd:   notice:
>> >     > crm_update_peer_state_iter:       pcmk_quorum_notification: Node
>> >     > airv_cu[181344357] - state is now member (was (null))
>> >     > May 05 16:15:20 [16294] airv_cu pacemakerd:     info:
>> >     > pcmk_cpg_membership:      Node 181344357 joined group pacemakerd
>> >     > (counter=0.0)
>> >     > May 05 16:15:20 [16294] airv_cu pacemakerd:     info:
>> >     > pcmk_cpg_membership:      Node 181344357 still member of group
>> >     > pacemakerd (peer=airv_cu, counter=0.0)
>> >     > May 05 16:15:20 [16294] airv_cu pacemakerd:  warning:
>> >     pcmk_child_exit:
>> >     >  The cib process (16353) can no longer be respawned, shutting the
>> >     > cluster down.
>> >     > May 05 16:15:20 [16294] airv_cu pacemakerd:   notice:
>> >     > pcmk_shutdown_worker:     Shutting down Pacemaker
>> >     >
>> >     > The log and conf file is attached.
>> >     >
>> >     > -Regards
>> >     > Nikhil
>> >     >
>> >     > On Thu, May 5, 2016 at 8:04 PM, Ken Gaillot <kgaillot at redhat.com
>> >     <mailto:kgaillot at redhat.com>
>> >     > <mailto:kgaillot at redhat.com <mailto:kgaillot at redhat.com>>> wrote:
>> >     >
>> >     >     On 05/05/2016 08:36 AM, Nikhil Utane wrote:
>> >     >     > Hi,
>> >     >     >
>> >     >     > Continuing with my adventure to run Pacemaker & Corosync
>> >     on our
>> >     >     > big-endian system, I managed to get past the corosync
>> >     issue for now. But
>> >     >     > facing an issue in running Pacemaker.
>> >     >     >
>> >     >     > Seeing following messages in corosync.log.
>> >     >     >  pacemakerd:  warning: pcmk_child_exit:  The cib process
>> >     (20000) can no
>> >     >     > longer be respawned, shutting the cluster down.
>> >     >     >  pacemakerd:  warning: pcmk_child_exit:  The stonith-ng
>> >     process (20001)
>> >     >     > can no longer be respawned, shutting the cluster down.
>> >     >     >  pacemakerd:  warning: pcmk_child_exit:  The lrmd process
>> >     (20002) can no
>> >     >     > longer be respawned, shutting the cluster down.
>> >     >     >  pacemakerd:  warning: pcmk_child_exit:  The attrd process
>> >     (20003) can
>> >     >     > no longer be respawned, shutting the cluster down.
>> >     >     >  pacemakerd:  warning: pcmk_child_exit:  The pengine
>> >     process (20004) can
>> >     >     > no longer be respawned, shutting the cluster down.
>> >     >     >  pacemakerd:  warning: pcmk_child_exit:  The crmd process
>> >     (20005) can no
>> >     >     > longer be respawned, shutting the cluster down.
>> >     >     >
>> >     >     > I see following error before these messages. Not sure if
>> >     this is the cause.
>> >     >     > May 05 11:26:24 [19998] airv_cu pacemakerd:    error:
>> >     >     > cluster_connect_quorum:   Corosync quorum is not configured
>> >     >     >
>> >     >     > I tried removing the quorum block (which is anyways blank)
>> >     from the conf
>> >     >     > file but still had the same error.
>> >     >
>> >     >     Yes, that is the issue. Pacemaker can't do anything if it
>> >     can't ask
>> >     >     corosync about quorum. I don't know what the issue is at the
>> >     corosync
>> >     >     level, but your corosync.conf should have:
>> >     >
>> >     >     quorum {
>> >     >         provider: corosync_votequorum
>> >     >     }
>> >     >
>> >     >
>> >     >     > Attaching the log and conf files. Please let me know if
>> >     there is any
>> >     >     > obvious mistake or how to investigate it further.
>> >     >     >
>> >     >     > I am using pcs cluster start command to start the cluster
>> >     >     >
>> >     >     > -Thanks
>> >     >     > Nikhil
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list: Users at clusterlabs.org
>> > http://clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160506/36ac8e18/attachment-0003.html>


More information about the Users mailing list