[ClusterLabs] Antw: Corosync ring marked as FAULTY
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Wed Feb 22 02:37:10 EST 2017
Is "ttl 1" a good idea for a public network?
>>> Denis Gribkov <dun at itsts.net> schrieb am 21.02.2017 um 18:26 in Nachricht
<4f5543c4-b80c-659d-ed5e-7a99e1482ced at itsts.net>:
> Hi Everyone.
>
> I have 16-nodes asynchronous cluster configured with Corosync redundant
> ring feature.
>
> Each node has 2 similarly connected/configured NIC's. One NIC connected
> to the public network,
>
> another one to our private VLAN. When I checked Corosync rings
> operability I found:
>
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
> id = 192.168.1.54
> status = Marking ringid 0 interface 192.168.1.54 FAULTY
> RING ID 1
> id = 111.11.11.1
> status = ring 1 active with no faults
>
> After some time of digging into I identified that if I enable back the
> failed ring with command:
>
> # corosync-cfgtool -r
>
> RING ID 0 will be marked as "active" for few minutes, but after it
> marked permanently as faulty.
>
> Log has no any useful info, just single message:
>
> corosync[21740]: [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY
>
> And no any message like:
>
> [TOTEM ] Automatically recovered ring 1
>
>
> My corosync.conf looks like:
>
> compatibility: whitetank
>
> totem {
> version: 2
> secauth: on
> threads: 4
> rrp_mode: passive
>
> interface {
>
> member {
> memberaddr: PRIVATE_IP_1
> }
>
> ...
>
> member {
> memberaddr: PRIVATE_IP_16
> }
>
> ringnumber: 0
> bindnetaddr: PRIVATE_NET_ADDR
> mcastaddr: 226.0.0.1
> mcastport: 5505
> ttl: 1
> }
>
> interface {
>
> member {
> memberaddr: PUBLIC_IP_1
> }
> ...
>
> member {
> memberaddr: PUBLIC_IP_16
> }
>
> ringnumber: 1
> bindnetaddr: PUBLIC_NET_ADDR
> mcastaddr: 224.0.0.1
> mcastport: 5405
> ttl: 1
> }
>
> transport: udpu
>
> logging {
> to_stderr: no
> to_logfile: yes
> logfile: /var/log/cluster/corosync.log
> logfile_priority: info
> to_syslog: yes
> syslog_priority: warning
> debug: on
> timestamp: on
> }
>
> I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
> but result was the similar.
>
> I checked multicast/unicast operability using omping utility and didn't
> found any issues.
>
> Also no errors on our private VLAN was found for network equipment.
>
> Why Corosync decided to disable permanently second ring? How I can debug
> the issue?
>
> Other properties:
>
> Corosync Cluster Engine, version '1.4.7'
>
> Pacemaker properties:
> cluster-infrastructure: cman
> cluster-recheck-interval: 5min
> dc-version: 1.1.14-8.el6-70404b0
> expected-quorum-votes: 3
> have-watchdog: false
> last-lrm-refresh: 1484068350
> maintenance-mode: false
> no-quorum-policy: ignore
> pe-error-series-max: 1000
> pe-input-series-max: 1000
> pe-warn-series-max: 1000
> stonith-action: reboot
> stonith-enabled: false
> symmetric-cluster: false
>
> Thank you.
>
> --
> Regards Denis Gribkov
More information about the Users
mailing list