[ClusterLabs] Is "Process pause detected" triggered too easily?

Jean-Marc Saffroy saffroy at gmail.com
Tue Sep 26 14:41:38 EDT 2017


As the subject line suggests, I am wondering why I see so many of these 
log lines (many means about 10 times per minute, usually several in the 
same second):

Sep 26 19:56:24 [950] vm0 corosync notice  [TOTEM ] Process pause detected 
for 2555 ms, flushing membership messages.
Sep 26 19:56:24 [950] vm0 corosync notice  [TOTEM ] Process pause detected 
for 2558 ms, flushing membership messages.

Let me add some context:
- this is observed in 3 small VMs on my laptop
- the OS is CentOS 7.3, corosync is 2.4.0-9.el7_4.2
- these VMs only run corosync, nothing else
- the VM host (my laptop) is idle 60-80% of the time
- VMs are qemu-kvm guests, connected with tap interfaces
- AND the messages only appear when, on one of the VMs, I do stop/start 
corosync in a tight loop, like this:

[root at vm2 ~]# while :; do echo $(date) stop; systemctl stop corosync ; 
echo $(date) start;systemctl start corosync ; done
Tue Sep 26 19:50:19 CEST 2017 stop
Tue Sep 26 19:50:21 CEST 2017 start
Tue Sep 26 19:50:21 CEST 2017 stop
Tue Sep 26 19:50:22 CEST 2017 start

I understand that this kind of test is stressful (and quite articial), but 
I'm still surprised to see these particular messages, because it seems to 
me a bit unlikely that the corosync process is not properly scheduled for 
seconds at a time so frequently (several times per minute).

So I wonder if maybe there could be other explanations?

Also, it looks like the side effect is that corosync drops important 
messages (I think "join" messages?), and I fear that this can lead to 
bigger issues with DLM (which is why I'm looking into this in the first 

In case that's helpful, attached are 10 minutes of corosync log and the 
config file I'm using (it has 5 nodes declared, but I reproduce even with 
just 3 nodes).

Thanks in advance for any suggestion!


saffroy at gmail.com
-------------- next part --------------
# Please read the corosync.conf.5 manual page

totem {
        config_version: 20170925231703
	version: 2

	transport: udpu

	# How long before declaring a token lost (ms)
	token: 3000

	# How many token retransmits before forming a new configuration
	token_retransmits_before_loss_const: 10

	# How long to wait for join messages in the membership protocol (ms)
	join: 100
	#send_join: 60

	# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
	consensus: 3600

	# Turn off the virtual synchrony filter
	vsftype: none

	# Number of messages that may be sent by one processor on receipt of the token
	max_messages: 20

	# Limit generated nodeids to 31-bits (positive signed integers)
	clear_node_high_bit: yes

	# Disable encryption
 	secauth: off

	# How many threads to use for encryption/decryption
 	threads: 0

	# Optionally assign a fixed node id (integer)
	# nodeid: 1234

	# This specifies the mode of redundant ring, which may be none, active, or passive.
 	rrp_mode: none

 	interface {
		# The following values need to be set based on your environment 
		ringnumber: 0
		#broadcast: yes
		#mcastport: 5405

	cluster_name: dlm

amf {
	mode: disabled

quorum {
	# Quorum for the Pacemaker Cluster Resource Manager
	provider: corosync_votequorum
	#expected_votes: 2
	quorum_votes: 0
	votes: 0

aisexec {
        user:   root
        group:  root

logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: yes
	syslog_facility: daemon
        debug: on
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: on
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6

nodelist {
	node {
      	     	# vm0
		quorum_votes: 1
		nodeid: 1
	node {
      	     	# vm1
		quorum_votes: 1
		nodeid: 2
	node {
      	     	# vm2
		quorum_votes: 1
		nodeid: 3
	node {
      	     	# vm3
		quorum_votes: 0
		nodeid: 4
	node {
      	     	# vm4
		quorum_votes: 0
		nodeid: 5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync.log.xz
Type: application/x-xz
Size: 186708 bytes
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170926/fad420aa/attachment-0002.xz>

More information about the Users mailing list