[ClusterLabs] Corosync permanently desyncs in face of packet loss
Jan Friesse
jfriesse at redhat.com
Mon Jan 18 03:51:36 EST 2021
Mariusz,
> Hi,
>
> We've had a hardware problem causing asynchronous packet drop on one of
> our nodes that caused unrecoverable (required
> restarting corosync on both nodes) state, that then repeated next day. Log of the events in
> attachment.
>
> It did recover few times after the problem, but when it happened it
> just spammed
>
> Jan 13 14:28:30 [20833] node-db2 corosync notice [TOTEM ] A new membership (2:72076) was formed. Members
> Jan 13 14:28:30 [20833] node-db2 corosync warning [CPG ] downlist left_list: 0 received
> Jan 13 14:28:30 [20833] node-db2 corosync notice [QUORUM] Members[1]: 2
> Jan 13 14:28:30 [20833] node-db2 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
>
> I've also seen some of
>
>
> corosync warning [KNET ] pmtud: possible MTU misconfiguration detected. kernel is reporting MTU: 1500 bytes for host 1 link 0 but the other node is not acknowledging packets of this size.
> corosync warning [KNET ] pmtud: This can be caused by this node interface MTU too big or a network device that does not support or has been misconfigured to manage MTU of this size, or packet loss. knet will continue to run but performances might be affected.
>
> in previous failure.
>
> After packet loss reason was fixed it also did not fix itself without restart.
>
> In limited testing with udpu protocol that did not occur but that period of testing was much shorter as we fixed the networking issue in the meantime.
>
> We've using stable version from Debian Buster (3.0.1). >
> Is that a known problem/bug ?
There were quite a few bugs in libknet < 1.15 and some of them may
explain behavior you see. I would suggest to try backports (where 1.16
seems to be available) or Proxmox repositories (where is packaged also
newer corosync).
Regards,
Honza
>
>
> Cheers,
> Mariusz
>
>
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
More information about the Users
mailing list