[ClusterLabs] Error observed while starting cluster

Wed Mar 21 14:17:29 UTC 2018

On Tue, 2018-03-20 at 05:59 +0000, Roshni Chatterjee wrote:
> Hi ,
>  
> Error observed in pacemaker and pcs status
> Error: cluster is not currently running on this node
>  
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> !!!
>  
> I have built the source  code of corosync (2.4.2) and pacemaker
> (1.1.16)  and  have followed the below steps for building a 2 node
> cluster .
>  
> 1.       Download source  code of corosync and pacemaker (versions as
> mentioned above ) and compile .
> 2.       Install pcsd using “yum install pcs”
> 3.       Allow cluster services through firewall using #firewall-cmd
> --permanent --add-service=high-availability
> 4.       Start and enable pcsd #systemctl start pcsd and #systemctl
> enable pcsd
> 5.       Change password for user hacluster
> 6.       pcs cluster auth pcmk3 node2
> 7.       pcs cluster setup --name mycluster pcmk3 node2
> 8.       pcs cluster start –all
> 9.       pcs status
>  
>  
> It is observed that the no error is received till step 8 . At step 9
> when pcs status is checked error is received (highlighted below)
> [root at node2 ~]# pacemakerd --features
> Pacemaker 1.1.16 (Build: 94ff4df51a)
> Supporting v3.0.11:  agent-manpages libqb-logging libqb-ipc nagios 
> corosync-native atomic-attrd acls
> [root at node2 ~]# pcs cluster start --all
> pcmk3: Starting Cluster...
> node2: Starting Cluster...
> [root at node2 ~]# pcs status
> Error: cluster is not currently running on this node
>                                                               
> On checking pacemaker status the following issue is found –
> [root at pcmk3 ~]# systemctl pacemaker status -l
> Unknown operation 'pacemaker'.
> [root at pcmk3 ~]# systemctl status pacemaker -l
> ● pacemaker.service - Pacemaker High Availability Cluster Manager
>    Loaded: loaded (/usr/lib/systemd/system/pacemaker.service;
> disabled; vendor preset: disabled)
>    Active: active (running) since Tue 2018-03-20 10:55:44 IST; 13min
> ago
>      Docs: man:pacemakerd
>            http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pa
> cemaker_Explained/index.html
> Main PID: 26932 (pacemakerd)
>    CGroup: /system.slice/pacemaker.service
>            ├─26932 /usr/sbin/pacemakerd -f
>            ├─26933 /usr/libexec/pacemaker/cib
>            ├─26934 /usr/libexec/pacemaker/stonithd
>            ├─26935 /usr/libexec/pacemaker/lrmd
>            ├─26936 /usr/libexec/pacemaker/attrd
>            └─26937 /usr/libexec/pacemaker/pengine
>  
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed
> child process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27035) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed
> child process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27036) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed
> child process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27037) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed
> child process: crmd
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: The crmd process
> (27038) exited: Key has expired (127)
> Mar 20 10:55:45 pcmk3 pacemakerd[26932]:    error: Child respawn
> count exceeded by crmd
> Mar 20 10:56:21 pcmk3 cib[26933]:    error: Operation ignored,
> cluster configuration is invalid. Please repair and restart: Update
> does not conform to the configured schema
> [root at pcmk3 ~]#
>  
> Corosync.log
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> start_child:        Forked child 27035 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> mcp_cpg_deliver:    Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> mcp_cpg_deliver:    Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:    error:
> pcmk_child_exit:    The crmd process (27035) exited: Key has expired
> (127)

This is the start of something going wrong. Pacemaker's crmd process
can't start for some reason. (Unfortunately the error codes in this
version are unhelpful. They have been overhauled in the upcoming 2.0.0
version.)

Pacemaker then goes into a loop of crmd exiting, pacemakerd restarting
it, repeat.

I'd recommend setting PCMK_debug=crmd in /etc/sysconfig/pacemaker, try
again, and look for more detailed log messages in the detail log (by
default /var/log/pacemaker.log or whatever is set in corosync.conf,
often /var/log/cluster/corosync.log).

When you compiled, did you "make install", or "make rpm" and then
install the rpm? The rpm will do steps that install doesn't, like
create the hacluster user and haclient group, that you would have to do
by hand after make install if not already present.

> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice:
> pcmk_process_exit:  Respawning failed child process: crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> start_child:        Using uid=189 and group=189 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> start_child:        Forked child 27036 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> mcp_cpg_deliver:    Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> mcp_cpg_deliver:    Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:    error:
> pcmk_child_exit:    The crmd process (27036) exited: Key has expired
> (127)
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice:
> pcmk_process_exit:  Respawning failed child process: crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> start_child:        Using uid=189 and group=189 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> start_child:        Forked child 27037 for process crmd
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> mcp_cpg_deliver:    Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:     info:
> mcp_cpg_deliver:    Ignoring process list sent by peer for local node
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:    error:
> pcmk_child_exit:    The crmd process (27037) exited: Key has expired
> (127)
> Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice:
> pcmk_process_exit:  Respawning failed child process: crmd
>  
>  
> Regards,
> Roshni
-- 
Ken Gaillot <kgaillot at redhat.com>