[ClusterLabs] redundant ring and corosync makes/sees it as loopback??
jfriesse at redhat.com
Fri Mar 1 02:37:13 EST 2019
> hi everyone
> My cluster faulted my secondary ring today and on one node I found this:
> Printing ring status.
> Local node ID 3
> RING ID 0
> id = 10.5.8.65
> status = ring 0 active with no faults
> RING ID 1
> id = 127.0.0.1
> status = Marking ringid 1 interface 127.0.0.1 FAULTY
> How the hell loopback address got there?
Short version: ifdown
Long version: When interface is put down, older versions of corosync
detected such condition and rebound to localhost. Without RRP it's
usually not that big deal, because node is (usually) fenced. With RRP
it's much bigger problem because this 127.0.0.1 is sent to all other
nodes and it completely poison whole cluster.
Definitive solution is to use corosync 3 with knet transport. Mostly
working solution is to use corosync 3 (or corosync 2 - needle branch
from git) with udpu transport.
Simple workaround is never use ifdown directly. Also network managers
quite often reacts to carrier lostt. Solution is to ether use network
scripts or if network manager must be used, use it with
NetworkManager-config-server package. Solution for systemd-networkd
should be to use IgnoreCarrierLoss= config option.
> I did: systemctl restart corosync and all went back to "okey"
> many thanks, L.
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users