[Pacemaker] killing corosync leaves crmd, stonithd, lrmd, cib and attrd to hog up the cpu
Andreas Kurz
andreas at hastexo.com
Mon Nov 14 14:35:32 UTC 2011
On 11/14/2011 02:19 PM, ihjaz Mohamed wrote:
> nope. Am not using stonith.
Highly recommended -- and a must have if shared storage is in use -- for
every pacemaker cluster ... since IPMI is available with most of the
current serverhardware no extra effort beside pacemaker configuration is
necessary.
Regards,
Andreas
--
Need help with Pacemaker?
http://www.hastexo.com/now
>
> ------------------------------------------------------------------------
> *From:* Andreas Kurz <andreas at hastexo.com>
> *To:* pacemaker at oss.clusterlabs.org
> *Sent:* Monday, 14 November 2011 6:08 PM
> *Subject:* Re: [Pacemaker] killing corosync leaves crmd, stonithd, lrmd,
> cib and attrd to hog up the cpu
>
> On 11/14/2011 12:32 PM, ihjaz Mohamed wrote:
>> Hi All,
>>
>> As part of some robustness test for my cluster, I tried killing the
>> corosync process using kill -9 <pid>. After this I see that the
>> pacemakerd service is stopped but the processes crmd, stonithd, lrmd,
>> cib and attrd are still running and are hogging up the cpu.
>
> Then fix your stonith setup if you want a "robust" cluster setup .... of
> course you are using stonith, aren't you?
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>>
>>
>> top - 06:26:51 up 2:01, 4 users, load average: 12.04, 12.01, 11.98
>> Tasks: 330 total, 13 running, 317 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 7.1%us, 17.1%sy, 0.0%ni, 75.6%id, 0.1%wa, 0.0%hi, 0.0%si,
>> 0.0%st
>> Mem: 8015444k total, 4804412k used, 3211032k free, 54800k buffers
>> Swap: 10256376k total, 0k used, 10256376k free, 1604464k cached
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 2053 hacluste RT 0 90492 3324 2476 R 100.0 0.0 113:40.61 crmd
>> 2047 root RT 0 81480 2108 1712 R 99.8 0.0 113:40.43 stonithd
>> 2048 hacluste RT 0 83404 5260 2992 R 99.8 0.1 113:40.90 cib
>> 2050 hacluste RT 0 85896 2388 1952 R 99.8 0.0 113:40.43 attrd
>> 5018 root 20 0 8787m 345m 56m S 2.0 4.4 0:56.95 java
>> 19017 root 20 0 15068 1252 796 R 2.0 0.0 0:00.01 top
>> 1 root 20 0 19232 1444 1156 S 0.0 0.0 0:01.71 init
>> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
>> 3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
>> 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
>>
>>
>> Is there a way to cleanup these processes ? OR Do I need to kill them
>> one by one before respawning the corosync?
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111114/b3eea666/attachment-0004.sig>
More information about the Pacemaker
mailing list