[ClusterLabs] cluster does not detect kill on pacemaker process ?

Thu Apr 6 00:16:06 CEST 2017

Hello All,

I noticed something on our pacemaker test cluster. The cluster is
configured to manage an underlying database using master slave primitive.

I ran a kill on the pacemaker process, all the other nodes kept showing the
node online. I went on to kill the underlying database on the same node
which would have been detected had the pacemaker on the node been online.
The cluster did not detect that the database on the node has failed, the
failover never occurred.

I went on to kill corosync on the same node and the cluster now marked the
node as stopped and proceeded to elect a new master.

In a separate test. I killed the pacemaker process on the cluster DC, the
cluster showed no change. I went on to change CIB on a different node. The
CIB modify command timed out. Once that occurred, the node didn't failover
even when I turned off corosync on cluster DC. The cluster didn't recover
after this mishap.

Is this expected behavior? Is there a solution for when OOM decides to kill
the pacemaker process?

I run pacemaker 1.1.14, with corosync 1.4. I have stonith disabled and
quorum enabled.

Thank you,

nwarriorch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170405/96a362dd/attachment-0001.html>