[ClusterLabs] Antw: Corosync ring marked as FAULTY
bliu
bliu at suse.com
Wed Feb 22 03:02:58 EST 2017
Hi, Denis
could you try tcpdump "udp port 5505" on the private network to see if
there is packet?
On 02/22/2017 03:47 PM, Denis Gribkov wrote:
>
> In our case it does not create problems since all nodes are located in
> few networks whichserved by single router.
>
> There are also no any errors detected on public ring 1 unlike private
> ring 0.
>
> I have a suspicion that this error could be related to private VLAN
> settings but unfortunately have no good idea how to found the issue.
>
> On 22/02/17 09:37, Ulrich Windl wrote:
>> Is "ttl 1" a good idea for a public network?
>>
>>>>> Denis Gribkov<dun at itsts.net> schrieb am 21.02.2017 um 18:26 in Nachricht
>> <4f5543c4-b80c-659d-ed5e-7a99e1482ced at itsts.net>:
>>> Hi Everyone.
>>>
>>> I have 16-nodes asynchronous cluster configured with Corosync redundant
>>> ring feature.
>>>
>>> Each node has 2 similarly connected/configured NIC's. One NIC connected
>>> to the public network,
>>>
>>> another one to our private VLAN. When I checked Corosync rings
>>> operability I found:
>>>
>>> # corosync-cfgtool -s
>>> Printing ring status.
>>> Local node ID 1
>>> RING ID 0
>>> id = 192.168.1.54
>>> status = Marking ringid 0 interface 192.168.1.54 FAULTY
>>> RING ID 1
>>> id = 111.11.11.1
>>> status = ring 1 active with no faults
>>>
>>> After some time of digging into I identified that if I enable back the
>>> failed ring with command:
>>>
>>> # corosync-cfgtool -r
>>>
>>> RING ID 0 will be marked as "active" for few minutes, but after it
>>> marked permanently as faulty.
>>>
>>> Log has no any useful info, just single message:
>>>
>>> corosync[21740]: [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY
>>>
>>> And no any message like:
>>>
>>> [TOTEM ] Automatically recovered ring 1
>>>
>>>
>>> My corosync.conf looks like:
>>>
>>> compatibility: whitetank
>>>
>>> totem {
>>> version: 2
>>> secauth: on
>>> threads: 4
>>> rrp_mode: passive
>>>
>>> interface {
>>>
>>> member {
>>> memberaddr: PRIVATE_IP_1
>>> }
>>>
>>> ...
>>>
>>> member {
>>> memberaddr: PRIVATE_IP_16
>>> }
>>>
>>> ringnumber: 0
>>> bindnetaddr: PRIVATE_NET_ADDR
>>> mcastaddr: 226.0.0.1
>>> mcastport: 5505
>>> ttl: 1
>>> }
>>>
>>> interface {
>>>
>>> member {
>>> memberaddr: PUBLIC_IP_1
>>> }
>>> ...
>>>
>>> member {
>>> memberaddr: PUBLIC_IP_16
>>> }
>>>
>>> ringnumber: 1
>>> bindnetaddr: PUBLIC_NET_ADDR
>>> mcastaddr: 224.0.0.1
>>> mcastport: 5405
>>> ttl: 1
>>> }
>>>
>>> transport: udpu
>>>
>>> logging {
>>> to_stderr: no
>>> to_logfile: yes
>>> logfile: /var/log/cluster/corosync.log
>>> logfile_priority: info
>>> to_syslog: yes
>>> syslog_priority: warning
>>> debug: on
>>> timestamp: on
>>> }
>>>
>>> I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
>>> but result was the similar.
>>>
>>> I checked multicast/unicast operability using omping utility and didn't
>>> found any issues.
>>>
>>> Also no errors on our private VLAN was found for network equipment.
>>>
>>> Why Corosync decided to disable permanently second ring? How I can debug
>>> the issue?
>>>
>>> Other properties:
>>>
>>> Corosync Cluster Engine, version '1.4.7'
>>>
>>> Pacemaker properties:
>>> cluster-infrastructure: cman
>>> cluster-recheck-interval: 5min
>>> dc-version: 1.1.14-8.el6-70404b0
>>> expected-quorum-votes: 3
>>> have-watchdog: false
>>> last-lrm-refresh: 1484068350
>>> maintenance-mode: false
>>> no-quorum-policy: ignore
>>> pe-error-series-max: 1000
>>> pe-input-series-max: 1000
>>> pe-warn-series-max: 1000
>>> stonith-action: reboot
>>> stonith-enabled: false
>>> symmetric-cluster: false
>>>
>>> Thank you.
>>>
>>> --
>>> Regards Denis Gribkov
>>
>>
>>
>> _______________________________________________
>> Users mailing list:Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home:http://www.clusterlabs.org
>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:http://bugs.clusterlabs.org
>
> --
> Regards Denis Gribkov
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170222/9cc49bbb/attachment-0003.html>
More information about the Users
mailing list