[ClusterLabs] corosync 3.0.1 on Debian/Buster reports some MTU errors
Jean-Francois Malouin
Jean-Francois.Malouin at bic.mni.mcgill.ca
Wed Nov 20 15:35:04 EST 2019
No one is willing to take a shot at this?
I had a fencing event related to that yesterday morning
Nov 19 08:04:01 node2 corosync[14399]: [KNET ] link: host: 1 link: 0 is down
Nov 19 08:04:01 node2 corosync[14399]: [KNET ] host: host: 1 (passive) best link: 1 (pri: 1)
...
Nov 19 08:05:04 node2 corosync[14399]: [KNET ] link: host: 1 link: 1 is down
Nov 19 08:05:04 node2 corosync[14399]: [KNET ] host: host: 1 (passive) best link: 1 (pri: 1)
Nov 19 08:05:04 node2 corosync[14399]: [KNET ] host: host: 1 has no active links
There are 2 links so I'm a bit baffled why the 2nd one didn't do the job...
thanks,
jf
* Jean-Francois Malouin <Jean-Francois.Malouin at bic.mni.mcgill.ca> [20191118 16:31]:
> Hi,
>
> Maybe not directly a pacemaker question but maybe some of you have seen this
> problem:
>
> A 2 node pacemaker cluster running corosync-3.0.1 with dual communication ring
> sometimes reports errors like this in the corosync log file:
>
> [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 470 to 1366
> [KNET ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366
> [KNET ] pmtud: Global data MTU changed to: 1366
> [CFG ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at run-time
> [CFG ] Modified entry 'totem.netmtu' in corosync.conf cannot be changed at run-time
>
> Those do not happen very frequenly, once a week or so...
>
> However the system log on the nodes reports those much more frequently, a few
> times a day:
>
> Nov 17 23:26:20 node1 corosync[2258]: [KNET ] link: host: 2 link: 1 is down
> Nov 17 23:26:20 node1 corosync[2258]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 0)
> Nov 17 23:26:26 node1 corosync[2258]: [KNET ] rx: host: 2 link: 1 is up
> Nov 17 23:26:26 node1 corosync[2258]: [KNET ] host: host: 2 (passive) best link: 1 (pri: 1)
>
> Are those to be dismissed or are they indicative of a network misconfig/problem?
> I tried setting 'knet_transport: udpu' in the totem section (the default value)
> but it didn't seem to make a difference...Hard coding netmtu to 1500 and
> allowing for longer (10s) token timeout also didn't seem to affect the issue.
>
>
> Corosync config follows:
>
> /etc/corosync/corosync.conf
>
> totem {
> version: 2
> cluster_name: bicha
> transport: knet
> link_mode: passive
> ip_version: ipv4
> token: 10000
> netmtu: 1500
> knet_transport: sctp
> crypto_model: openssl
> crypto_hash: sha256
> crypto_cipher: aes256
> keyfile: /etc/corosync/authkey
> interface {
> linknumber: 0
> knet_transport: udp
> knet_link_priority: 0
> }
> interface {
> linknumber: 1
> knet_transport: udp
> knet_link_priority: 1
> }
> }
> quorum {
> provider: corosync_votequorum
> two_node: 1
> # expected_votes: 2
> }
> nodelist {
> node {
> ring0_addr: xxx.xxx.xxx.xxx
> ring1_addr: zzz.zzz.zzz.zzx
> name: node1
> nodeid: 1
> }
> node {
> ring0_addr: xxx.xxx.xxx.xxy
> ring1_addr: zzz.zzz.zzz.zzy
> name: node2
> nodeid: 2
> }
> }
> logging {
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/corosync/corosync.log
> syslog_facility: daemon
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list