[ClusterLabs] Corosync permanently desyncs in face of packet loss
Mariusz Gronczewski
mariusz.gronczewski at efigence.com
Fri Jan 15 07:38:17 EST 2021
Hi,
We've had a hardware problem causing asynchronous packet drop on one of
our nodes that caused unrecoverable (required
restarting corosync on both nodes) state, that then repeated next day. Log of the events in
attachment.
It did recover few times after the problem, but when it happened it
just spammed
Jan 13 14:28:30 [20833] node-db2 corosync notice [TOTEM ] A new membership (2:72076) was formed. Members
Jan 13 14:28:30 [20833] node-db2 corosync warning [CPG ] downlist left_list: 0 received
Jan 13 14:28:30 [20833] node-db2 corosync notice [QUORUM] Members[1]: 2
Jan 13 14:28:30 [20833] node-db2 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
I've also seen some of
corosync warning [KNET ] pmtud: possible MTU misconfiguration detected. kernel is reporting MTU: 1500 bytes for host 1 link 0 but the other node is not acknowledging packets of this size.
corosync warning [KNET ] pmtud: This can be caused by this node interface MTU too big or a network device that does not support or has been misconfigured to manage MTU of this size, or packet loss. knet will continue to run but performances might be affected.
in previous failure.
After packet loss reason was fixed it also did not fix itself without restart.
In limited testing with udpu protocol that did not occur but that period of testing was much shorter as we fixed the networking issue in the meantime.
We've using stable version from Debian Buster (3.0.1).
Is that a known problem/bug ?
Cheers,
Mariusz
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
NOC: [+48] 22 380 10 20
E: admin at efigence.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node1.log
Type: text/x-log
Size: 3049 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210115/73b76bff/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node2.log
Type: text/x-log
Size: 138282 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210115/73b76bff/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eventlog.csv
Type: text/csv
Size: 924638 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210115/73b76bff/attachment-0001.csv>
More information about the Users
mailing list