[ClusterLabs] pcsd 99% CPU
Scott Greenlese
swgreenl at us.ibm.com
Fri Feb 3 16:08:26 EST 2017
Hi all..
Over the past few days, I noticed that pcsd and ruby process is pegged at
99% CPU, and commands such as
pcs status pcsd take up to 5 minutes to complete. On all active cluster
nodes, top shows:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
27225 haclust+ 20 0 116324 91600 23136 R 99.3
0.1 1943:40 cib
23277 root 20 0 12.868g 8.176g 8460 S 99.7
13.0 407:44.18 ruby
The system log indicates High CIB load detected over the past 2 days:
[root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Feb
3" |wc -l
1655
[root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Feb
2" |wc -l
1658
[root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Feb
1" |wc -l
147
[root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Jan
31" |wc -l
444
[root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep "Jan
30" |wc -l
352
The first entries logged on Feb 2 started around 8:42am ...
Feb 2 08:42:12 zs95kj crmd[27233]: notice: High CIB load detected:
0.974333
This happens to coincide with the time that I had caused a node fence (off)
action by creating a iface-bridge resources and specified
a non-existent vlan slave interface (reported to the group yesterday in a
separate email thread). It also happened to cause me to lose
quorum in the cluster, because 2 of my 5 cluster nodes were already
offline.
My cluster currently has just over 200 VirtualDomain resources to manage,
plus one iface-bridge resource and one iface-vlan resource.
Both of which are currently configured properly and operational.
I would appreciate some guidance how to proceed with debugging this issue.
I have not taken any recovery actions yet.
I considered stopping the cluster, recycling pcsd.service on all nodes,
restarting cluster... and also, reboot the nodes, if
necessary. But, didn't want to clear it yet in case there's anything I can
capture while in this state.
Thanks..
Scott Greenlese ... KVM on System Z - Solutions Test, Poughkeepsie, N.Y.
INTERNET: swgreenl at us.ibm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170203/a44d77ec/attachment-0002.html>
More information about the Users
mailing list