[ClusterLabs] Antw: Corosync ring marked as FAULTY

Wed Feb 22 03:24:06 EST 2017

Hi,

Just tried - no any packet captured even if I re-enabled ring 0 on all 
nodes.

On the other hand if I create listen UDP socket on the same port using 
nc utility:

# nc -u -l 5505

then try to send some messages from other node - I can capture and see 
these packets.

On 22/02/17 10:02, bliu wrote:
>
> Hi, Denis
>
> could you try tcpdump "udp port 5505" on the private network to see if 
> there is packet?
>
>
> On 02/22/2017 03:47 PM, Denis Gribkov wrote:
>>
>> In our case it does not create problems since all nodes are located 
>> in few networks whichserved by single router.
>>
>> There are also no any errors detected on public ring 1 unlike private 
>> ring 0.
>>
>> I have a suspicion that this error could be related to private VLAN 
>> settings but unfortunately have no good idea how to found the issue.
>>
>> On 22/02/17 09:37, Ulrich Windl wrote:
>>> Is "ttl 1" a good idea for a public network?
>>>
>>>>>> Denis Gribkov<dun at itsts.net>  schrieb am 21.02.2017 um 18:26 in Nachricht
>>> <4f5543c4-b80c-659d-ed5e-7a99e1482ced at itsts.net>:
>>>> Hi Everyone.
>>>>
>>>> I have 16-nodes asynchronous cluster configured with Corosync redundant
>>>> ring feature.
>>>>
>>>> Each node has 2 similarly connected/configured NIC's. One NIC connected
>>>> to the public network,
>>>>
>>>> another one to our private VLAN. When I checked Corosync rings
>>>> operability I found:
>>>>
>>>> # corosync-cfgtool -s
>>>> Printing ring status.
>>>> Local node ID 1
>>>> RING ID 0
>>>>           id      = 192.168.1.54
>>>>           status  = Marking ringid 0 interface 192.168.1.54 FAULTY
>>>> RING ID 1
>>>>           id      = 111.11.11.1
>>>>           status  = ring 1 active with no faults
>>>>
>>>> After some time of digging into I identified that if I enable back the
>>>> failed ring with command:
>>>>
>>>> # corosync-cfgtool -r
>>>>
>>>> RING ID 0 will be marked as "active" for few minutes, but after it
>>>> marked permanently as faulty.
>>>>
>>>> Log has no any useful info, just single message:
>>>>
>>>> corosync[21740]:   [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY
>>>>
>>>> And no any message like:
>>>>
>>>> [TOTEM ] Automatically recovered ring 1
>>>>
>>>>
>>>> My corosync.conf looks like:
>>>>
>>>> compatibility: whitetank
>>>>
>>>> totem {
>>>>           version: 2
>>>>           secauth: on
>>>>           threads: 4
>>>>           rrp_mode: passive
>>>>
>>>>           interface {
>>>>
>>>>                   member {
>>>>                           memberaddr: PRIVATE_IP_1
>>>>                   }
>>>>
>>>> ...
>>>>
>>>>                   member {
>>>>                           memberaddr: PRIVATE_IP_16
>>>>                   }
>>>>
>>>>                   ringnumber: 0
>>>>                   bindnetaddr: PRIVATE_NET_ADDR
>>>>                   mcastaddr: 226.0.0.1
>>>>                   mcastport: 5505
>>>>                   ttl: 1
>>>>           }
>>>>
>>>>          interface {
>>>>
>>>>                   member {
>>>>                           memberaddr: PUBLIC_IP_1
>>>>                   }
>>>> ...
>>>>
>>>>                   member {
>>>>                           memberaddr: PUBLIC_IP_16
>>>>                   }
>>>>
>>>>                   ringnumber: 1
>>>>                   bindnetaddr: PUBLIC_NET_ADDR
>>>>                   mcastaddr: 224.0.0.1
>>>>                   mcastport: 5405
>>>>                   ttl: 1
>>>>           }
>>>>
>>>>           transport: udpu
>>>>
>>>> logging {
>>>>           to_stderr: no
>>>>           to_logfile: yes
>>>>           logfile: /var/log/cluster/corosync.log
>>>>           logfile_priority: info
>>>>           to_syslog: yes
>>>>           syslog_priority: warning
>>>>           debug: on
>>>>           timestamp: on
>>>> }
>>>>
>>>> I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
>>>> but result was the similar.
>>>>
>>>> I checked multicast/unicast operability using omping utility and didn't
>>>> found any issues.
>>>>
>>>> Also no errors on our private VLAN was found for network equipment.
>>>>
>>>> Why Corosync decided to disable permanently second ring? How I can debug
>>>> the issue?
>>>>
>>>> Other properties:
>>>>
>>>> Corosync Cluster Engine, version '1.4.7'
>>>>
>>>> Pacemaker properties:
>>>>    cluster-infrastructure: cman
>>>>    cluster-recheck-interval: 5min
>>>>    dc-version: 1.1.14-8.el6-70404b0
>>>>    expected-quorum-votes: 3
>>>>    have-watchdog: false
>>>>    last-lrm-refresh: 1484068350
>>>>    maintenance-mode: false
>>>>    no-quorum-policy: ignore
>>>>    pe-error-series-max: 1000
>>>>    pe-input-series-max: 1000
>>>>    pe-warn-series-max: 1000
>>>>    stonith-action: reboot
>>>>    stonith-enabled: false
>>>>    symmetric-cluster: false
>>>>
>>>> Thank you.
>>>>
>>>> -- 
>>>> Regards Denis Gribkov
>>>
>>>
>>> _______________________________________________
>>> Users mailing list:Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home:http://www.clusterlabs.org
>>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:http://bugs.clusterlabs.org
>>
>> -- 
>> Regards Denis Gribkov
>>
>>
>> _______________________________________________
>> Users mailing list:Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home:http://www.clusterlabs.org
>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Regards Denis Gribkov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170222/692b4d4a/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3695 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170222/692b4d4a/attachment-0003.p7s>