[ClusterLabs] pcsd 99% CPU

Mon Feb 6 09:07:04 UTC 2017

On 03/02/17 16:08 -0500, Scott Greenlese wrote:
> Over the past few days, I noticed that pcsd and ruby process is pegged at
> 99% CPU, and commands such as pcs status pcsd  take up to 5 minutes to complete.
> On all active cluster nodes, top shows:
> 
> PID 	USER 	 PR 	NI 	VIRT 	  RES 	  SHR    S  %CPU %MEM  TIME+
> COMMAND
> 27225 	haclust+ 20 	0 	116324   91600 	   23136 R  99.3
> 0.1      1943:40 	    cib
> 23277   root       20        0          12.868g  8.176g   8460   S  99.7
> 13.0        407:44.18       ruby
> 
> [...]
> 
> I would appreciate some guidance how to proceed with debugging this issue.
> I have not taken any recovery actions yet.
> I considered stopping the cluster, recycling pcsd.service on all nodes,
> restarting cluster... and also, reboot the nodes, if
> necessary.  But, didn't want to clear it yet in case there's anything I can
> capture while in this state.

If you still have the pcsd/ruby process in that state, it might be
worth dumping a core for further off-line examination.  Assuming you
have enough space to store it (in order of gigabytes, it seems) and
gdb installed, you can do it like: gcore -o pcsd.core 23277

I have no idea how far the support for Ruby interpretation in gdb
goes (Python is quite well supported in terms of high level
debugging), but could be enough for figuring out what's going on.

If you are confident enough your cluster configuration does not
contain anything too confidential, it would perhaps be best if
you shared this core file in a compressed form privately with
tojeline at redhat.  Otherwise, you can use gdb itself to look
around the call stack in the core file, strings utility to guess
if there's excessive accumulation of particular strings, and similar
analyses, some of which are applicable also on live process, and
some would be usable only on live process (like strace).

Hope this helps at least a bit.

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170206/acfe901b/attachment-0002.sig>