[Pacemaker] killing corosync leaves crmd, stonithd, lrmd, cib and attrd to hog up the cpu
Dan Frincu
df.cluster at gmail.com
Mon Nov 14 12:18:43 UTC 2011
Hi,
On Mon, Nov 14, 2011 at 1:32 PM, ihjaz Mohamed <ihjazmohamed at yahoo.co.in> wrote:
> Hi All,
> As part of some robustness test for my cluster, I tried killing the corosync
> process using kill -9 <pid>. After this I see that the pacemakerd service is
> stopped but the processes crmd, stonithd, lrmd, cib and attrd are still
> running and are hogging up the cpu.
I have seen this kind of testing before and I have to say I don't
consider it the recommended way of testing the cluster stack's
"robustness". Pacemaker processes rely on corosync for proper
functioning. You kill corosync and then want to "cleanup" the
processes? You have to go through a lot more literature in order to
understand how this cluster stack works.
For the Master Control Process, how it works and other related
information (which is related to what you are experiencing), see
http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for
The essential guide you need is
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/
HTH,
Dan
>
> top - 06:26:51 up 2:01, 4 users, load average: 12.04, 12.01, 11.98
> Tasks: 330 total, 13 running, 317 sleeping, 0 stopped, 0 zombie
> Cpu(s): 7.1%us, 17.1%sy, 0.0%ni, 75.6%id, 0.1%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 8015444k total, 4804412k used, 3211032k free, 54800k buffers
> Swap: 10256376k total, 0k used, 10256376k free, 1604464k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2053 hacluste RT 0 90492 3324 2476 R 100.0 0.0 113:40.61 crmd
> 2047 root RT 0 81480 2108 1712 R 99.8 0.0 113:40.43 stonithd
> 2048 hacluste RT 0 83404 5260 2992 R 99.8 0.1 113:40.90 cib
> 2050 hacluste RT 0 85896 2388 1952 R 99.8 0.0 113:40.43 attrd
> 5018 root 20 0 8787m 345m 56m S 2.0 4.4 0:56.95 java
> 19017 root 20 0 15068 1252 796 R 2.0 0.0 0:00.01 top
> 1 root 20 0 19232 1444 1156 S 0.0 0.0 0:01.71 init
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
> 3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
> 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
>
>
> Is there a way to cleanup these processes ? OR Do I need to kill them one by
> one before respawning the corosync?
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
--
Dan Frincu
CCNA, RHCE
More information about the Pacemaker
mailing list