[ClusterLabs] corosync 3.0.1 on Debian/Buster reports some MTU errors

Mon Nov 18 16:31:34 EST 2019

Hi,

Maybe not directly a pacemaker question but maybe some of you have seen this
problem:

A 2 node pacemaker cluster running corosync-3.0.1 with dual communication ring
sometimes reports errors like this in the corosync log file:

[KNET  ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366
[KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366
[KNET  ] pmtud: Global data MTU changed to: 1366
[CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at run-time
[CFG   ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at run-time

Those do not happen very frequenly, once a week or so...

However the system log on the nodes reports those much more frequently, a few
times a day:

Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] link: host: 2 link: 1 is down
Nov 17 23:26:20 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 0)
Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] rx: host: 2 link: 1 is up
Nov 17 23:26:26 node1 corosync[2258]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)

Are those to be dismissed or are they indicative of a network misconfig/problem?
I tried setting 'knet_transport: udpu' in the totem section (the default value)
but it didn't seem to make a difference...Hard coding netmtu to 1500 and
allowing for longer (10s) token timeout also didn't seem to affect the issue.

Corosync config follows:

/etc/corosync/corosync.conf

totem {
    version: 2
    cluster_name: bicha
    transport: knet
    link_mode: passive
    ip_version: ipv4
    token: 10000
    netmtu: 1500
    knet_transport: sctp
    crypto_model: openssl
    crypto_hash: sha256
    crypto_cipher: aes256
    keyfile: /etc/corosync/authkey
    interface {
        linknumber: 0
        knet_transport: udp
        knet_link_priority: 0
    }
    interface {
        linknumber: 1
        knet_transport: udp
        knet_link_priority: 1
    }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
#    expected_votes: 2
}
nodelist {
    node {
        ring0_addr: xxx.xxx.xxx.xxx
        ring1_addr: zzz.zzz.zzz.zzx
        name: node1
        nodeid: 1
    } 
    node {
        ring0_addr: xxx.xxx.xxx.xxy
        ring1_addr: zzz.zzz.zzz.zzy
        name: node2
        nodeid: 2
    } 
}
logging {
    to_logfile: yes
    to_syslog: yes
    logfile: /var/log/corosync/corosync.log
    syslog_facility: daemon
    debug: off
    timestamp: on
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}