[ClusterLabs] Corosync with passive rrp, udpu - Unable to reset after "Marking ringid 1 interface 127.0.0.1 FAULTY"
Martin Schlegel
martin at nuboreto.org
Thu Jun 16 14:51:55 UTC 2016
Hello everyone,
we run a 3 node Pacemaker (1.1.14) / Corosync (2.3.5) cluster for a couple of
months successfully and we have started seeing a faulty ring with unexpected
127.0.0.1 binding that we cannot reset via "corosync-cfgtool -r".
We have had this once before and only restarting Corosync (and everything else)
on the node showing the unexpected 127.0.0.1 binding made the problem go away.
However, in production we obviously would like to avoid this if possible.
So from the following description - how can I troubleshoot this issue and/or
does anybody have a good idea what might be happening here ?
We run 2x passive rrp rings across different IP-subnets via udpu and we get the
following output (all IPs obfuscated) - please notice the unexpected interface
binding 127.0.0.1 for host pg2.
If we reset via "corosync-cfgtool -r" on each node heartbeat ring id 1 briefly
shows "no faults" but goes back to "FAULTY" seconds later.
Regards,
Martin Schlegel
_____________________________________
root at pg1:~# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = A.B.C1.5
status = ring 0 active with no faults
RING ID 1
id = D.E.F1.170
status = Marking ringid 1 interface D.E.F1.170 FAULTY
root at pg2:~# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id = A.B.C2.88
status = ring 0 active with no faults
RING ID 1
id = 127.0.0.1
status = Marking ringid 1 interface 127.0.0.1 FAULTY
root at pg3:~# corosync-cfgtool -s
Printing ring status.
Local node ID 3
RING ID 0
id = A.B.C3.236
status = ring 0 active with no faults
RING ID 1
id = D.E.F3.112
status = Marking ringid 1 interface D.E.F3.112 FAULTY
_____________________________________
/etc/corosync/corosync.conf from pg1 0 other nodes use different subnets and
IPs, but are otherwise identical:
===========================================
quorum {
provider: corosync_votequorum
expected_votes: 3
}
totem {
version: 2
crypto_cipher: none
crypto_hash: none
rrp_mode: passive
interface {
ringnumber: 0
bindnetaddr: A.B.C1.0
mcastport: 5405
ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: D.E.F1.64
mcastport: 5405
ttl: 1
}
transport: udpu
}
nodelist {
node {
ring0_addr: pg1
ring1_addr: pg1p
nodeid: 1
}
node {
ring0_addr: pg2
ring1_addr: pg2p
nodeid: 2
}
node {
ring0_addr: pg3
ring1_addr: pg3p
nodeid: 3
}
}
logging {
to_syslog: yes
}
===========================================
More information about the Users
mailing list