[ClusterLabs] pcsd processes using 100% CPU
Casey & Gina
caseyandgina at icloud.com
Wed May 23 14:43:04 EDT 2018
Okay, I have this happening again on a couple servers right now, and am happy to let it spin and dig more into it. I'm not at all experienced with stuff like this though, so will need some explicit instruction on what to do beyond what I've documented here...
I don't see anything of note in the pcsd.log - seems to just be normal activity being logged by the master process that isn't runaway. Here's a snippet:
10.124.167.177 - - [23/May/2018:15:56:34 +0000] "GET /remote/get_configs HTTP/1.1" 200 553 0.0145
10.124.167.177 - - [23/May/2018:15:56:34 +0000] "GET /remote/get_configs HTTP/1.1" 200 553 0.0147
10.124.167.177 - - [23/May/2018:15:56:34 UTC] "GET /remote/get_configs HTTP/1.1" 200 553
- -> /remote/get_configs
I, [2018-05-23T15:56:37.972682 #1378] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name
I, [2018-05-23T15:56:37.972805 #1378] INFO -- : CIB USER: hacluster, groups:
I, [2018-05-23T15:56:37.982066 #1378] INFO -- : Return Value: 0
10.124.167.176 - - [23/May/2018:15:56:37 +0000] "GET /remote/get_configs HTTP/1.1" 200 553 0.0107
10.124.167.176 - - [23/May/2018:15:56:37 +0000] "GET /remote/get_configs HTTP/1.1" 200 553 0.0108
10.124.167.176 - - [23/May/2018:15:56:37 UTC] "GET /remote/get_configs HTTP/1.1" 200 553
- -> /remote/get_configs
I, [2018-05-23T15:57:10.648134 #1378] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name
I, [2018-05-23T15:57:10.648276 #1378] INFO -- : CIB USER: hacluster, groups:
I, [2018-05-23T15:57:10.660617 #1378] INFO -- : Return Value: 0
10.124.167.178 - - [23/May/2018:15:57:10 +0000] "GET /remote/get_configs HTTP/1.1" 200 553 0.0140
10.124.167.178 - - [23/May/2018:15:57:10 +0000] "GET /remote/get_configs HTTP/1.1" 200 553 0.0141
10.124.167.178 - - [23/May/2018:15:57:10 UTC] "GET /remote/get_configs HTTP/1.1" 200 553
- -> /remote/get_configs
I ran `strace -p <pid>`, and the screen filled with the following line repeating as fast as my terminal can render:
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
I redirected this into a file for about 1 second and it filled with about 20,000 of those lines.
I installed ltrace, but didn't really know how to use it...
`ltrace -p <pid>` didn't output anything.
`ltrace -p <pid> -S` showed something similar to strace:
SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962) = 0
SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962) = 0
SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962) = 0
I next enabled debugging in /etc/default/pcsd and issued a `systemctl restart pcsd`. Unfortunately, that killed the runaway child process.
However, I found another server where it's also happening again. Debugging is not enabled there, but is there anything else I can do while the process is still running?
Here are the pcsd processes:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 6103 0.0 0.3 1076744 59972 ? Ssl Apr06 67:17 /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
root 24923 99.8 0.3 1076744 52744 ? Rl May19 5556:31 \_ /usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
I don't have gcore installed and don't know which package might provide it. I also don't have experience with gdb but am happy to try anything suggested to help figure out what's going on.
The pcs version is 0.9.149, as packaged by Debian and inherited by Ubuntu.
Regards,
--
Casey
More information about the Users
mailing list