[ClusterLabs] Corosync ring marked as FAULTY
Denis Gribkov
dun at itsts.net
Tue Feb 21 12:26:33 EST 2017
Hi Everyone.
I have 16-nodes asynchronous cluster configured with Corosync redundant
ring feature.
Each node has 2 similarly connected/configured NIC's. One NIC connected
to the public network,
another one to our private VLAN. When I checked Corosync rings
operability I found:
# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.1.54
status = Marking ringid 0 interface 192.168.1.54 FAULTY
RING ID 1
id = 111.11.11.1
status = ring 1 active with no faults
After some time of digging into I identified that if I enable back the
failed ring with command:
# corosync-cfgtool -r
RING ID 0 will be marked as "active" for few minutes, but after it
marked permanently as faulty.
Log has no any useful info, just single message:
corosync[21740]: [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY
And no any message like:
[TOTEM ] Automatically recovered ring 1
My corosync.conf looks like:
compatibility: whitetank
totem {
version: 2
secauth: on
threads: 4
rrp_mode: passive
interface {
member {
memberaddr: PRIVATE_IP_1
}
...
member {
memberaddr: PRIVATE_IP_16
}
ringnumber: 0
bindnetaddr: PRIVATE_NET_ADDR
mcastaddr: 226.0.0.1
mcastport: 5505
ttl: 1
}
interface {
member {
memberaddr: PUBLIC_IP_1
}
...
member {
memberaddr: PUBLIC_IP_16
}
ringnumber: 1
bindnetaddr: PUBLIC_NET_ADDR
mcastaddr: 224.0.0.1
mcastport: 5405
ttl: 1
}
transport: udpu
logging {
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
logfile_priority: info
to_syslog: yes
syslog_priority: warning
debug: on
timestamp: on
}
I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
but result was the similar.
I checked multicast/unicast operability using omping utility and didn't
found any issues.
Also no errors on our private VLAN was found for network equipment.
Why Corosync decided to disable permanently second ring? How I can debug
the issue?
Other properties:
Corosync Cluster Engine, version '1.4.7'
Pacemaker properties:
cluster-infrastructure: cman
cluster-recheck-interval: 5min
dc-version: 1.1.14-8.el6-70404b0
expected-quorum-votes: 3
have-watchdog: false
last-lrm-refresh: 1484068350
maintenance-mode: false
no-quorum-policy: ignore
pe-error-series-max: 1000
pe-input-series-max: 1000
pe-warn-series-max: 1000
stonith-action: reboot
stonith-enabled: false
symmetric-cluster: false
Thank you.
--
Regards Denis Gribkov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170221/54a1bac1/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3695 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170221/54a1bac1/attachment-0002.p7s>
More information about the Users
mailing list