[ClusterLabs] Keep printing "Sent 0 CPG messages" in corosync.log

Mon Oct 1 02:41:18 EDT 2018

lkxjtu,

> 
> 
> Corosync.log has kept printing the following logs for several days. What's wrong with the corosync cluster? Now the cpu load is not high.

Interesting messages from logs you've sent are:

Sep 30 01:23:28 [127667] paas-controller-172-21-0-2 corosync warning 
[MAIN  ] timer_function_scheduler_timeout Corosync main process was not 
scheduled for 10470.3652 ms (threshold is 2400.0000 ms). Consider token 
timeout increase.

and

Sep 30 01:23:29 [127667] paas-controller-172-21-0-2 corosync notice 
[TOTEM ] pause_flush Process pause detected for 8760 ms, flushing 
membership messages.

This means that corosync was unable to get required time to run. This 
can happen because of:
- (Most often) cluster is running in highly overloaded VMs (quite often 
cloud environments)
- Corosync doesn't have a RT priority or there is another RT priority 
task using most of the time
- I/O problem
- Misbehaving watchdog device
- Bug in corosync

Honza

> 
> Cluster version information:
> [root at paas-controller-172-167-40-24:~]$ rpm -q corosync
> corosync-2.4.0-9.el7_4.2.x86_64
> [root at paas-controller-172-167-40-24:~]$ rpm -q pacemaker
> pacemaker-1.1.16-12.el7_4.2.x86_64
> 
> 
> 
> Sep 30 01:23:27 [128232] paas-controller-172-21-0-2        cib:     info: crm_cs_flush: Sent 0 CPG messages  (13 remaining, last=363): Try again (6)
...