[ClusterLabs] Antw: Corosync ring marked as FAULTY
Denis Gribkov
dun at itsts.net
Wed Feb 22 03:24:06 EST 2017
Hi,
Just tried - no any packet captured even if I re-enabled ring 0 on all
nodes.
On the other hand if I create listen UDP socket on the same port using
nc utility:
# nc -u -l 5505
then try to send some messages from other node - I can capture and see
these packets.
On 22/02/17 10:02, bliu wrote:
>
> Hi, Denis
>
> could you try tcpdump "udp port 5505" on the private network to see if
> there is packet?
>
>
> On 02/22/2017 03:47 PM, Denis Gribkov wrote:
>>
>> In our case it does not create problems since all nodes are located
>> in few networks whichserved by single router.
>>
>> There are also no any errors detected on public ring 1 unlike private
>> ring 0.
>>
>> I have a suspicion that this error could be related to private VLAN
>> settings but unfortunately have no good idea how to found the issue.
>>
>> On 22/02/17 09:37, Ulrich Windl wrote:
>>> Is "ttl 1" a good idea for a public network?
>>>
>>>>>> Denis Gribkov<dun at itsts.net> schrieb am 21.02.2017 um 18:26 in Nachricht
>>> <4f5543c4-b80c-659d-ed5e-7a99e1482ced at itsts.net>:
>>>> Hi Everyone.
>>>>
>>>> I have 16-nodes asynchronous cluster configured with Corosync redundant
>>>> ring feature.
>>>>
>>>> Each node has 2 similarly connected/configured NIC's. One NIC connected
>>>> to the public network,
>>>>
>>>> another one to our private VLAN. When I checked Corosync rings
>>>> operability I found:
>>>>
>>>> # corosync-cfgtool -s
>>>> Printing ring status.
>>>> Local node ID 1
>>>> RING ID 0
>>>> id = 192.168.1.54
>>>> status = Marking ringid 0 interface 192.168.1.54 FAULTY
>>>> RING ID 1
>>>> id = 111.11.11.1
>>>> status = ring 1 active with no faults
>>>>
>>>> After some time of digging into I identified that if I enable back the
>>>> failed ring with command:
>>>>
>>>> # corosync-cfgtool -r
>>>>
>>>> RING ID 0 will be marked as "active" for few minutes, but after it
>>>> marked permanently as faulty.
>>>>
>>>> Log has no any useful info, just single message:
>>>>
>>>> corosync[21740]: [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY
>>>>
>>>> And no any message like:
>>>>
>>>> [TOTEM ] Automatically recovered ring 1
>>>>
>>>>
>>>> My corosync.conf looks like:
>>>>
>>>> compatibility: whitetank
>>>>
>>>> totem {
>>>> version: 2
>>>> secauth: on
>>>> threads: 4
>>>> rrp_mode: passive
>>>>
>>>> interface {
>>>>
>>>> member {
>>>> memberaddr: PRIVATE_IP_1
>>>> }
>>>>
>>>> ...
>>>>
>>>> member {
>>>> memberaddr: PRIVATE_IP_16
>>>> }
>>>>
>>>> ringnumber: 0
>>>> bindnetaddr: PRIVATE_NET_ADDR
>>>> mcastaddr: 226.0.0.1
>>>> mcastport: 5505
>>>> ttl: 1
>>>> }
>>>>
>>>> interface {
>>>>
>>>> member {
>>>> memberaddr: PUBLIC_IP_1
>>>> }
>>>> ...
>>>>
>>>> member {
>>>> memberaddr: PUBLIC_IP_16
>>>> }
>>>>
>>>> ringnumber: 1
>>>> bindnetaddr: PUBLIC_NET_ADDR
>>>> mcastaddr: 224.0.0.1
>>>> mcastport: 5405
>>>> ttl: 1
>>>> }
>>>>
>>>> transport: udpu
>>>>
>>>> logging {
>>>> to_stderr: no
>>>> to_logfile: yes
>>>> logfile: /var/log/cluster/corosync.log
>>>> logfile_priority: info
>>>> to_syslog: yes
>>>> syslog_priority: warning
>>>> debug: on
>>>> timestamp: on
>>>> }
>>>>
>>>> I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
>>>> but result was the similar.
>>>>
>>>> I checked multicast/unicast operability using omping utility and didn't
>>>> found any issues.
>>>>
>>>> Also no errors on our private VLAN was found for network equipment.
>>>>
>>>> Why Corosync decided to disable permanently second ring? How I can debug
>>>> the issue?
>>>>
>>>> Other properties:
>>>>
>>>> Corosync Cluster Engine, version '1.4.7'
>>>>
>>>> Pacemaker properties:
>>>> cluster-infrastructure: cman
>>>> cluster-recheck-interval: 5min
>>>> dc-version: 1.1.14-8.el6-70404b0
>>>> expected-quorum-votes: 3
>>>> have-watchdog: false
>>>> last-lrm-refresh: 1484068350
>>>> maintenance-mode: false
>>>> no-quorum-policy: ignore
>>>> pe-error-series-max: 1000
>>>> pe-input-series-max: 1000
>>>> pe-warn-series-max: 1000
>>>> stonith-action: reboot
>>>> stonith-enabled: false
>>>> symmetric-cluster: false
>>>>
>>>> Thank you.
>>>>
>>>> --
>>>> Regards Denis Gribkov
>>>
>>>
>>> _______________________________________________
>>> Users mailing list:Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home:http://www.clusterlabs.org
>>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:http://bugs.clusterlabs.org
>>
>> --
>> Regards Denis Gribkov
>>
>>
>> _______________________________________________
>> Users mailing list:Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home:http://www.clusterlabs.org
>> Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
--
Regards Denis Gribkov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170222/692b4d4a/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3695 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170222/692b4d4a/attachment-0003.p7s>
More information about the Users
mailing list