[ClusterLabs] pcsd processes using 100% CPU

Tue May 22 13:09:24 EDT 2018

On 18/05/18 20:04 +0000, Shobe, Casey wrote:
> On a couple clusters that have been running for a little while
> (without fencing), I'm seeing runaway server.rb processes using 100%
> of a single CPU core each.
> 
> When I look at ps, I can see that these have something to do with
> pcsd:
> 
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root      6103  0.0  0.3 1076744 59200 ?       Ssl  Apr06  59:09 /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
> root     17548 99.3  0.2 873648 46308 ?        Rl   Apr18 43356:57  \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
> root     16688 98.9  0.3 941160 49472 ?        Rl   May01 24300:52  \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
> root      6009 98.8  0.3 942188 49688 ?        R    May02 22607:08  \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
> root     15556 98.8  0.3 1076344 51836 ?       R    May03 21410:12  \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
> 
> Running strace on one of the processes shows that they are looping
> on sched_yield().

Can you share some HW specs with us, at least the architecture
to start with -- x86_64=amd64, arm (gen/mode?), something else?

The suspicion here is that just the first one may be sufficiently
free from code porting glitches, I mean at the Ruby interpreter
level or lower.

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180522/e95f31cf/attachment-0002.sig>