[Pacemaker] killing corosync leaves crmd, stonithd, lrmd, cib and attrd to hog up the cpu

Florian Haas florian at hastexo.com
Mon Nov 14 12:52:24 UTC 2011


On 2011-11-14 13:18, Dan Frincu wrote:
> Hi,
> 
> On Mon, Nov 14, 2011 at 1:32 PM, ihjaz Mohamed <ihjazmohamed at yahoo.co.in> wrote:
>> Hi All,
>> As part of some robustness test for my cluster, I tried killing the corosync
>> process using kill -9 <pid>. After this I see that the pacemakerd service is
>> stopped but the processes crmd, stonithd, lrmd, cib and attrd are still
>> running and are hogging up the cpu.
> 
> I have seen this kind of testing before and I have to say I don't
> consider it the recommended way of testing the cluster stack's
> "robustness". Pacemaker processes rely on corosync for proper
> functioning. You kill corosync and then want to "cleanup" the
> processes? You have to go through a lot more literature in order to
> understand how this cluster stack works.

Well I, for my part, don't consider this kind of testing unreasonable at
all. If Corosync dies, say due to a segfault, then the cluster had
better recover to a consistent state.

Thus, this (very valid) testing highlights that the cluster is evidently
misconfigured; it's either not using Pacemaker MCP at all, or doesn't
have STONITH configured, or neither.

Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now




More information about the Pacemaker mailing list