[ClusterLabs] pcsd 99% CPU
jpokorny at redhat.com
Mon Feb 6 04:07:04 EST 2017
On 03/02/17 16:08 -0500, Scott Greenlese wrote:
> Over the past few days, I noticed that pcsd and ruby process is pegged at
> 99% CPU, and commands such as pcs status pcsd take up to 5 minutes to complete.
> On all active cluster nodes, top shows:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> 27225 haclust+ 20 0 116324 91600 23136 R 99.3
> 0.1 1943:40 cib
> 23277 root 20 0 12.868g 8.176g 8460 S 99.7
> 13.0 407:44.18 ruby
> I would appreciate some guidance how to proceed with debugging this issue.
> I have not taken any recovery actions yet.
> I considered stopping the cluster, recycling pcsd.service on all nodes,
> restarting cluster... and also, reboot the nodes, if
> necessary. But, didn't want to clear it yet in case there's anything I can
> capture while in this state.
If you still have the pcsd/ruby process in that state, it might be
worth dumping a core for further off-line examination. Assuming you
have enough space to store it (in order of gigabytes, it seems) and
gdb installed, you can do it like: gcore -o pcsd.core 23277
I have no idea how far the support for Ruby interpretation in gdb
goes (Python is quite well supported in terms of high level
debugging), but could be enough for figuring out what's going on.
If you are confident enough your cluster configuration does not
contain anything too confidential, it would perhaps be best if
you shared this core file in a compressed form privately with
tojeline at redhat. Otherwise, you can use gdb itself to look
around the call stack in the core file, strings utility to guess
if there's excessive accumulation of particular strings, and similar
analyses, some of which are applicable also on live process, and
some would be usable only on live process (like strace).
Hope this helps at least a bit.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 819 bytes
Desc: not available
More information about the Users