[ClusterLabs] Corosync CPU load slowly increasing if one node present

Thu Apr 27 10:56:34 EDT 2017

Stefan,

> Hello everyone!
>
> I am using Pacemaker (1.1.12), Corosync (2.3.0) and libqb (0.16.0) in 2-node clusters (virtualized in VMware infrastructure, OS: RHEL 6.7).
> I noticed that if only one node is present, the CPU usage of Corosync (as seen with top) is slowly but steadily increasing (over days; in my setting about 1% per day). The node is basically idle, some Pacemaker managed resources are running but they are not contacted by any clients.
> I upgraded a test stand-alone node to Corosync (2.4.2) and libqb (1.0.1) (which at least made the memleak go away), but the CPU usage is still increasing on the node.
> When I add a second node to the cluster, the CPU load drops back down to a normal (low) CPU usage.
> I haven't witnessed the increasing CPU load yet if two nodes were present in a cluster.
>
> Even if running Pacemaker/Corosync as a massive-overkill-Monit-replacement is questionable, the observed CPU-load is not what I expect to happen.
>
> What could be the reason for this CPU-load increase? Is there a rational behind this?

This is really interesting observation. I can talk about corosync and I 
must say no, there is no rationale behind. It simply shouldn't be 
happening. Also I don't see any reason why connection of other node(s) 
could help to remove CPU-load.

> Is this a config thing or something in the binaries?

For sure not in corosync. Also your config file looks just ok.

Could you test single ring only and udpu if behavior stays same?

Regards,
   Honza

>
> BR, Stefan
>
> My corosync.conf:
>
> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
>
> aisexec {
>          user:root
>          group:root
> }
>
> totem {
>          version: 2
>
>          # Security configuration
>          secauth: on
>          threads: 0
>
>          # Timeout for token
>          token: 1000
>          token_retransmits_before_loss_const: 4
>
>          # Number of messages that may be sent by one processor on receipt of the token
>          max_messages: 20
>
>          # How long to wait for join messages in the membership protocol (ms)
>          join: 50
>          consensus: 1200
>
>          # Turn off the virtual synchrony filter
>          vsftype: none
>
>          # Stagger sending the node join messages by 1..send_join ms
>          send_join: 50
>
>          # Limit generated nodeids to 31-bits (positive signed integers)
>          clear_node_high_bit: yes
>
>          # Interface configuration
>          rrp_mode: passive
>          interface {
>                  ringnumber: 0
>                  bindnetaddr: 10.20.30.0
>                  mcastaddr: 226.95.30.100
>                  mcastport: 5510
>          }
>          interface {
>                  ringnumber: 1
>                  bindnetaddr: 10.20.31.0
>                  mcastaddr: 226.95.31.100
>                  mcastport: 5510
>          }
> }
>
> logging {
>          fileline: off
>          to_stderr: no
>          to_logfile: no
>          to_syslog: yes
>          syslog_facility: local3
>          debug: off
> }
>
> amf {
>          mode: disabled
> }
>
> quorum {
>          provider: corosync_votequorum
>          expected_votes: 1
> }
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>