[ClusterLabs] Antw: Corosync ring marked as FAULTY

Denis Gribkov dun at itsts.net
Wed Feb 22 07:47:48 UTC 2017


In our case it does not create problems since all nodes are located in 
few networks whichserved by single router.

There are also no any errors detected on public ring 1 unlike private 
ring 0.

I have a suspicion that this error could be related to private VLAN 
settings but unfortunately have no good idea how to found the issue.

On 22/02/17 09:37, Ulrich Windl wrote:
> Is "ttl 1" a good idea for a public network?
>
>>>> Denis Gribkov <dun at itsts.net> schrieb am 21.02.2017 um 18:26 in Nachricht
> <4f5543c4-b80c-659d-ed5e-7a99e1482ced at itsts.net>:
>> Hi Everyone.
>>
>> I have 16-nodes asynchronous cluster configured with Corosync redundant
>> ring feature.
>>
>> Each node has 2 similarly connected/configured NIC's. One NIC connected
>> to the public network,
>>
>> another one to our private VLAN. When I checked Corosync rings
>> operability I found:
>>
>> # corosync-cfgtool -s
>> Printing ring status.
>> Local node ID 1
>> RING ID 0
>>           id      = 192.168.1.54
>>           status  = Marking ringid 0 interface 192.168.1.54 FAULTY
>> RING ID 1
>>           id      = 111.11.11.1
>>           status  = ring 1 active with no faults
>>
>> After some time of digging into I identified that if I enable back the
>> failed ring with command:
>>
>> # corosync-cfgtool -r
>>
>> RING ID 0 will be marked as "active" for few minutes, but after it
>> marked permanently as faulty.
>>
>> Log has no any useful info, just single message:
>>
>> corosync[21740]:   [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY
>>
>> And no any message like:
>>
>> [TOTEM ] Automatically recovered ring 1
>>
>>
>> My corosync.conf looks like:
>>
>> compatibility: whitetank
>>
>> totem {
>>           version: 2
>>           secauth: on
>>           threads: 4
>>           rrp_mode: passive
>>
>>           interface {
>>
>>                   member {
>>                           memberaddr: PRIVATE_IP_1
>>                   }
>>
>> ...
>>
>>                   member {
>>                           memberaddr: PRIVATE_IP_16
>>                   }
>>
>>                   ringnumber: 0
>>                   bindnetaddr: PRIVATE_NET_ADDR
>>                   mcastaddr: 226.0.0.1
>>                   mcastport: 5505
>>                   ttl: 1
>>           }
>>
>>          interface {
>>
>>                   member {
>>                           memberaddr: PUBLIC_IP_1
>>                   }
>> ...
>>
>>                   member {
>>                           memberaddr: PUBLIC_IP_16
>>                   }
>>
>>                   ringnumber: 1
>>                   bindnetaddr: PUBLIC_NET_ADDR
>>                   mcastaddr: 224.0.0.1
>>                   mcastport: 5405
>>                   ttl: 1
>>           }
>>
>>           transport: udpu
>>
>> logging {
>>           to_stderr: no
>>           to_logfile: yes
>>           logfile: /var/log/cluster/corosync.log
>>           logfile_priority: info
>>           to_syslog: yes
>>           syslog_priority: warning
>>           debug: on
>>           timestamp: on
>> }
>>
>> I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
>> but result was the similar.
>>
>> I checked multicast/unicast operability using omping utility and didn't
>> found any issues.
>>
>> Also no errors on our private VLAN was found for network equipment.
>>
>> Why Corosync decided to disable permanently second ring? How I can debug
>> the issue?
>>
>> Other properties:
>>
>> Corosync Cluster Engine, version '1.4.7'
>>
>> Pacemaker properties:
>>    cluster-infrastructure: cman
>>    cluster-recheck-interval: 5min
>>    dc-version: 1.1.14-8.el6-70404b0
>>    expected-quorum-votes: 3
>>    have-watchdog: false
>>    last-lrm-refresh: 1484068350
>>    maintenance-mode: false
>>    no-quorum-policy: ignore
>>    pe-error-series-max: 1000
>>    pe-input-series-max: 1000
>>    pe-warn-series-max: 1000
>>    stonith-action: reboot
>>    stonith-enabled: false
>>    symmetric-cluster: false
>>
>> Thank you.
>>
>> -- 
>> Regards Denis Gribkov
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Regards Denis Gribkov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170222/a2d2af68/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3695 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170222/a2d2af68/attachment-0002.p7s>


More information about the Users mailing list