[Pacemaker] killing corosync leaves crmd, stonithd, lrmd, cib and attrd to hog up the cpu

Mon Nov 14 13:19:38 UTC 2011

nope. Am not using stonith.

________________________________
From: Andreas Kurz <andreas at hastexo.com>
To: pacemaker at oss.clusterlabs.org
Sent: Monday, 14 November 2011 6:08 PM
Subject: Re: [Pacemaker] killing corosync leaves crmd, stonithd, lrmd, cib and attrd to hog up the cpu

On 11/14/2011 12:32 PM, ihjaz Mohamed wrote:
> Hi All,
> 
> As part of some robustness test for my cluster, I tried killing the
> corosync process using kill -9 <pid>. After this I see that the
> pacemakerd service is stopped but the processes crmd, stonithd, lrmd,
> cib and attrd are still running and are hogging up the cpu.

Then fix your stonith setup if you want a "robust" cluster setup .... of
course you are using stonith, aren't you?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> 
> top - 06:26:51 up  2:01,  4 users,  load average: 12.04, 12.01, 11.98
> Tasks: 330 total,  13 running, 317 sleeping,   0 stopped,   0 zombie
> Cpu(s):  7.1%us, 17.1%sy,  0.0%ni, 75.6%id,  0.1%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Mem:   8015444k total,  4804412k used,  3211032k free,    54800k buffers
> Swap: 10256376k total,        0k used, 10256376k free,  1604464k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2053 hacluste  RT   0 90492 3324 2476 R 100.0  0.0 113:40.61 crmd
>  2047 root      RT   0 81480 2108 1712 R 99.8  0.0 113:40.43 stonithd
>  2048 hacluste  RT   0 83404 5260 2992 R 99.8  0.1 113:40.90 cib
>  2050 hacluste  RT   0 85896 2388 1952 R 99.8  0.0 113:40.43 attrd
>  5018 root      20   0 8787m 345m  56m S  2.0  4.4   0:56.95 java
> 19017 root      20   0 15068 1252  796 R  2.0  0.0   0:00.01 top
>     1 root      20   0 19232 1444 1156 S  0.0  0.0   0:01.71 init
>     2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
>     3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
>     4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
> 
> 
> Is there a way to cleanup these processes ? OR Do I need to kill them
> one by one before respawning the corosync?
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111114/da109c66/attachment.htm>