[ClusterLabs] Understanding the behavior of pacemaker crash
Prasad Nagaraj
prasad.nagaraj76 at gmail.com
Thu Sep 27 04:15:15 EDT 2018
Hello - I was trying to understand the behavior or cluster when pacemaker
crashes on one of the nodes. So I hard killed pacemakerd and its related
processes.
--------------------------------------------------------------------------------------------------------
[root at SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
root 74022 1 0 07:53 pts/0 00:00:00 pacemakerd
189 74028 74022 0 07:53 ? 00:00:00 /usr/libexec/pacemaker/cib
root 74029 74022 0 07:53 ? 00:00:00
/usr/libexec/pacemaker/stonithd
root 74030 74022 0 07:53 ? 00:00:00
/usr/libexec/pacemaker/lrmd
189 74031 74022 0 07:53 ? 00:00:00
/usr/libexec/pacemaker/attrd
189 74032 74022 0 07:53 ? 00:00:00
/usr/libexec/pacemaker/pengine
189 74033 74022 0 07:53 ? 00:00:00
/usr/libexec/pacemaker/crmd
root 75228 50092 0 07:54 pts/0 00:00:00 grep pacemaker
[root at SG-mysqlold-907 azureuser]# kill -9 74022
[root at SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
root 74030 1 0 07:53 ? 00:00:00
/usr/libexec/pacemaker/lrmd
189 74032 1 0 07:53 ? 00:00:00
/usr/libexec/pacemaker/pengine
root 75303 50092 0 07:55 pts/0 00:00:00 grep pacemaker
[root at SG-mysqlold-907 azureuser]# kill -9 74030
[root at SG-mysqlold-907 azureuser]# kill -9 74032
[root at SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker
root 75332 50092 0 07:55 pts/0 00:00:00 grep pacemaker
[root at SG-mysqlold-907 azureuser]# crm satus
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport
endpoint is not connected
-----------------------------------------------------------------------------------------------------------------------------
However, this does not seem to be having any effect on the cluster status
from other nodes
---------------------------------------------------------------------------------------------------------------------------
[root at SG-mysqlold-909 azureuser]# crm status
Last updated: Thu Sep 27 07:56:17 2018 Last change: Thu Sep 27
07:53:43 2018 by root via crm_attribute on SG-mysqlold-909
Stack: classic openais (with plugin)
Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0) - partition
with quorum
3 nodes and 3 resources configured, 3 expected votes
Online: [ SG-mysqlold-907 SG-mysqlold-908 SG-mysqlold-909 ]
Full list of resources:
Master/Slave Set: ms_mysql [p_mysql]
Masters: [ SG-mysqlold-909 ]
Slaves: [ SG-mysqlold-907 SG-mysqlold-908 ]
[root at SG-mysqlold-908 azureuser]# crm status
Last updated: Thu Sep 27 07:56:08 2018 Last change: Thu Sep 27
07:53:43 2018 by root via crm_attribute on SG-mysqlold-909
Stack: classic openais (with plugin)
Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0) - partition
with quorum
3 nodes and 3 resources configured, 3 expected votes
Online: [ SG-mysqlold-907 SG-mysqlold-908 SG-mysqlold-909 ]
Full list of resources:
Master/Slave Set: ms_mysql [p_mysql]
Masters: [ SG-mysqlold-909 ]
Slaves: [ SG-mysqlold-907 SG-mysqlold-908 ]
----------------------------------------------------------------------------------------------------------------------
I am bit surprised that other nodes are not able to detect that pacemaker
is down on one of the nodes - SG-mysqlold-907
Even if I kill pacemaker on the node which is a DC - I observe the same
behavior with rest of the nodes not detecting that DC is down.
Could some one explain what is the expected behavior in these cases ?
I am using corosync 1.4.7 and pacemaker 1.1.14
Thanks in advance
Prasad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180927/ab12af2c/attachment-0001.html>
More information about the Users
mailing list