[ClusterLabs] pcsd 99% CPU

Tomas Jelinek tojeline at redhat.com
Mon Feb 6 04:53:01 EST 2017


Dne 3.2.2017 v 22:08 Scott Greenlese napsal(a):
> Hi all..
>
> Over the past few days, I noticed that pcsd and ruby process is pegged
> at 99% CPU, and commands such as
> pcs status pcsd take up to 5 minutes to complete. On all active cluster
> nodes, top shows:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 27225 haclust+ 20 0 116324 91600 23136 R 99.3 0.1 1943:40 cib
> 23277 root 20 0 12.868g 8.176g 8460 S 99.7 13.0 407:44.18 ruby
>
> The system log indicates High CIB load detected over the past 2 days:
>
> [root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep
> "Feb 3" |wc -l
> 1655
> [root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep
> "Feb 2" |wc -l
> 1658
> [root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep
> "Feb 1" |wc -l
> 147
> [root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep
> "Jan 31" |wc -l
> 444
> [root at zs95kj ~]# grep "High CIB load detected" /var/log/messages |grep
> "Jan 30" |wc -l
> 352
>
>
> The first entries logged on Feb 2 started around 8:42am ...
>
> Feb 2 08:42:12 zs95kj crmd[27233]: notice: High CIB load detected: 0.974333
>
> This happens to coincide with the time that I had caused a node fence
> (off) action by creating a iface-bridge resources and specified
> a non-existent vlan slave interface (reported to the group yesterday in
> a separate email thread). It also happened to cause me to lose
> quorum in the cluster, because 2 of my 5 cluster nodes were already
> offline.
>
> My cluster currently has just over 200 VirtualDomain resources to
> manage, plus one iface-bridge resource and one iface-vlan resource.
> Both of which are currently configured properly and operational.
>
> I would appreciate some guidance how to proceed with debugging this
> issue. I have not taken any recovery actions yet.

Checking /var/log/pcsd/pcsd.log to see what pcsd is actually doing might 
be a good start. What pcsd version do you have?

> I considered stopping the cluster, recycling pcsd.service on all nodes,
> restarting cluster... and also, reboot the nodes, if
> necessary. But, didn't want to clear it yet in case there's anything I
> can capture while in this state.

Restarting just pcsd might be enough.

Tomas

>
> Thanks..
>
> Scott Greenlese ... KVM on System Z - Solutions Test, Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Users mailing list