[ClusterLabs] Unable to run Pacemaker: pcmk_child_exit

Fri May 6 11:11:30 UTC 2016

On 05/06/2016 12:40 PM, Nikhil Utane wrote:
> Hi,
>
> I used the blackbox feature which showed the reason for failure.
> As I am cross-compiling pacemaker on a build machine and later moving
> the binaries to the target, few binaries were missing. After fixing
> that and bunch of other errors/warning, I am able to get pacemaker
> started though not completely running fine.
>
> The node is not getting added:
> airv_cu        cib:    error: xml_log:Element node failed to validate
> attributes
>
> I suppose it is because of this error:
> crmd:    error: node_list_update_callback:Node update 4 failed: Update
> does not conform to the configured schema (-203)
>
> I am suspecting this is caused because of
> validate-with="pacemaker-0.7" in the cib. In another installation this
> is being set to '"pacemaker-2.0"'
>
> [root at airv_cu pacemaker]# pcs cluster cib
> <cib crm_feature_set="3.0.10" validate-with="pacemaker-0.7" epoch="3"
> num_updates="0" admin_epoch="0" cib-last-written="Fri May  6 09:28:10
> 2016" have-quorum="1">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-have-watchdog"
> name="have-watchdog" value="true"/>
>         <nvpair id="cib-bootstrap-options-dc-version"
> name="dc-version" value="1.1.14-5a6cdd1"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="corosync"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes/>
>     <resources/>
>     <constraints/>
>   </configuration>
>   <status/>
> </cib>
>
> Any idea why/where this is being set to 0.7. I am using latest
> pacemaker from GitHub.

What kind of tooling did you use to create the cib?
Try 'cibadmin --upgrade'. That should set the cib-version to what your
pacemaker-version supports.

>
> [root at airv_cu pacemaker]# pacemakerd --version
> Pacemaker 1.1.14
> Written by Andrew Beekhof
>
> Attaching the corosync.log and corosync.conf file. 
>
> -Thanks
> Nikhil
>
>
> On Thu, May 5, 2016 at 10:21 PM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
>
>     On 05/05/2016 11:25 AM, Nikhil Utane wrote:
>     > Thanks Ken for your quick response as always.
>     >
>     > But what if I don't want to use quorum? I just want to bring up
>     > pacemaker + corosync on 1 node to check that it all comes up fine.
>     > I added corosync_votequorum as you suggested. Additionally I
>     also added
>     > these 2 lines:
>     >
>     > expected_votes: 2
>     > two_node: 1
>
>     There's actually nothing wrong with configuring a single-node cluster.
>     You can list just one node in corosync.conf and leave off the above.
>
>     > However still pacemaker is not able to run.
>
>     There must be other issues involved. Even if pacemaker doesn't have
>     quorum, it will still run, it just won't start resources.
>
>     > [root at airv_cu root]# pcs cluster start
>     > Starting Cluster...
>     > Starting Pacemaker Cluster Manager[FAILED]
>     >
>     > Error: unable to start pacemaker
>     >
>     > Corosync.log:
>     > *May 05 16:15:20 [16294] airv_cu pacemakerd:     info:
>     > pcmk_quorum_notification: Membership 240: quorum still lost (1)*
>     > May 05 16:15:20 [16259] airv_cu corosync debug   [QB    ] Free'ing
>     > ringbuffer: /dev/shm/qb-cmap-request-16259-16294-21-header
>     > May 05 16:15:20 [16294] airv_cu pacemakerd:   notice:
>     > crm_update_peer_state_iter:       pcmk_quorum_notification: Node
>     > airv_cu[181344357] - state is now member (was (null))
>     > May 05 16:15:20 [16294] airv_cu pacemakerd:     info:
>     > pcmk_cpg_membership:      Node 181344357 joined group pacemakerd
>     > (counter=0.0)
>     > May 05 16:15:20 [16294] airv_cu pacemakerd:     info:
>     > pcmk_cpg_membership:      Node 181344357 still member of group
>     > pacemakerd (peer=airv_cu, counter=0.0)
>     > May 05 16:15:20 [16294] airv_cu pacemakerd:  warning:
>     pcmk_child_exit:
>     >  The cib process (16353) can no longer be respawned, shutting the
>     > cluster down.
>     > May 05 16:15:20 [16294] airv_cu pacemakerd:   notice:
>     > pcmk_shutdown_worker:     Shutting down Pacemaker
>     >
>     > The log and conf file is attached.
>     >
>     > -Regards
>     > Nikhil
>     >
>     > On Thu, May 5, 2016 at 8:04 PM, Ken Gaillot <kgaillot at redhat.com
>     <mailto:kgaillot at redhat.com>
>     > <mailto:kgaillot at redhat.com <mailto:kgaillot at redhat.com>>> wrote:
>     >
>     >     On 05/05/2016 08:36 AM, Nikhil Utane wrote:
>     >     > Hi,
>     >     >
>     >     > Continuing with my adventure to run Pacemaker & Corosync
>     on our
>     >     > big-endian system, I managed to get past the corosync
>     issue for now. But
>     >     > facing an issue in running Pacemaker.
>     >     >
>     >     > Seeing following messages in corosync.log.
>     >     >  pacemakerd:  warning: pcmk_child_exit:  The cib process
>     (20000) can no
>     >     > longer be respawned, shutting the cluster down.
>     >     >  pacemakerd:  warning: pcmk_child_exit:  The stonith-ng
>     process (20001)
>     >     > can no longer be respawned, shutting the cluster down.
>     >     >  pacemakerd:  warning: pcmk_child_exit:  The lrmd process
>     (20002) can no
>     >     > longer be respawned, shutting the cluster down.
>     >     >  pacemakerd:  warning: pcmk_child_exit:  The attrd process
>     (20003) can
>     >     > no longer be respawned, shutting the cluster down.
>     >     >  pacemakerd:  warning: pcmk_child_exit:  The pengine
>     process (20004) can
>     >     > no longer be respawned, shutting the cluster down.
>     >     >  pacemakerd:  warning: pcmk_child_exit:  The crmd process
>     (20005) can no
>     >     > longer be respawned, shutting the cluster down.
>     >     >
>     >     > I see following error before these messages. Not sure if
>     this is the cause.
>     >     > May 05 11:26:24 [19998] airv_cu pacemakerd:    error:
>     >     > cluster_connect_quorum:   Corosync quorum is not configured
>     >     >
>     >     > I tried removing the quorum block (which is anyways blank)
>     from the conf
>     >     > file but still had the same error.
>     >
>     >     Yes, that is the issue. Pacemaker can't do anything if it
>     can't ask
>     >     corosync about quorum. I don't know what the issue is at the
>     corosync
>     >     level, but your corosync.conf should have:
>     >
>     >     quorum {
>     >         provider: corosync_votequorum
>     >     }
>     >
>     >
>     >     > Attaching the log and conf files. Please let me know if
>     there is any
>     >     > obvious mistake or how to investigate it further.
>     >     >
>     >     > I am using pcs cluster start command to start the cluster
>     >     >
>     >     > -Thanks
>     >     > Nikhil
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org