<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>In our case it does not create problems since all nodes are
      located in few networks which<span id="result_box"
        class="short_text" lang="en"><span class="alt-edited"> served by
          single router.</span></span></p>
    <p><span id="result_box" class="short_text" lang="en"><span
          class="alt-edited">There are also no any errors detected on
          public ring 1 unlike private ring 0.</span></span></p>
    <div id="gt-res-content">
      <div id="gt-res-dir-ctr" dir="ltr"
        class="trans-verified-button-large">
        <div id="tts_button"><object
            type="application/x-shockwave-flash"
            data="//ssl.gstatic.com/translate/sound_player2.swf"
            id="tts" height="18" width="18"></object></div>
        <span id="result_box" class="short_text" lang="en"><span>I have
            a suspicion that this error could be related to private VLAN
            settings but </span></span><span id="result_box"
          class="short_text" lang="en"><span><span id="result_box"
              class="short_text" lang="en"><span>unfortunately </span></span>have
            no good idea how to found the issue.<br>
          </span></span></div>
    </div>
    <br>
    <div class="moz-cite-prefix">On 22/02/17 09:37, Ulrich Windl wrote:<br>
    </div>
    <blockquote
      cite="mid:58AD3FA6020000A100024C39@gwsmtp1.uni-regensburg.de"
      type="cite">
      <pre wrap="">Is "ttl 1" a good idea for a public network?

</pre>
      <blockquote type="cite">
        <blockquote type="cite">
          <blockquote type="cite">
            <pre wrap="">Denis Gribkov <a class="moz-txt-link-rfc2396E" href="mailto:dun@itsts.net"><dun@itsts.net></a> schrieb am 21.02.2017 um 18:26 in Nachricht
</pre>
          </blockquote>
        </blockquote>
      </blockquote>
      <pre wrap=""><a class="moz-txt-link-rfc2396E" href="mailto:4f5543c4-b80c-659d-ed5e-7a99e1482ced@itsts.net"><4f5543c4-b80c-659d-ed5e-7a99e1482ced@itsts.net></a>:
</pre>
      <blockquote type="cite">
        <pre wrap="">Hi Everyone.

I have 16-nodes asynchronous cluster configured with Corosync redundant 
ring feature.

Each node has 2 similarly connected/configured NIC's. One NIC connected 
to the public network,

another one to our private VLAN. When I checked Corosync rings 
operability I found:

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
         id      = 192.168.1.54
         status  = Marking ringid 0 interface 192.168.1.54 FAULTY
RING ID 1
         id      = 111.11.11.1
         status  = ring 1 active with no faults

After some time of digging into I identified that if I enable back the 
failed ring with command:

# corosync-cfgtool -r

RING ID 0 will be marked as "active" for few minutes, but after it 
marked permanently as faulty.

Log has no any useful info, just single message:

corosync[21740]:   [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY

And no any message like:

[TOTEM ] Automatically recovered ring 1


My corosync.conf looks like:

compatibility: whitetank

totem {
         version: 2
         secauth: on
         threads: 4
         rrp_mode: passive

         interface {

                 member {
                         memberaddr: PRIVATE_IP_1
                 }

...

                 member {
                         memberaddr: PRIVATE_IP_16
                 }

                 ringnumber: 0
                 bindnetaddr: PRIVATE_NET_ADDR
                 mcastaddr: 226.0.0.1
                 mcastport: 5505
                 ttl: 1
         }

        interface {

                 member {
                         memberaddr: PUBLIC_IP_1
                 }
...

                 member {
                         memberaddr: PUBLIC_IP_16
                 }

                 ringnumber: 1
                 bindnetaddr: PUBLIC_NET_ADDR
                 mcastaddr: 224.0.0.1
                 mcastport: 5405
                 ttl: 1
         }

         transport: udpu

logging {
         to_stderr: no
         to_logfile: yes
         logfile: /var/log/cluster/corosync.log
         logfile_priority: info
         to_syslog: yes
         syslog_priority: warning
         debug: on
         timestamp: on
}

I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0, 
but result was the similar.

I checked multicast/unicast operability using omping utility and didn't 
found any issues.

Also no errors on our private VLAN was found for network equipment.

Why Corosync decided to disable permanently second ring? How I can debug 
the issue?

Other properties:

Corosync Cluster Engine, version '1.4.7'

Pacemaker properties:
  cluster-infrastructure: cman
  cluster-recheck-interval: 5min
  dc-version: 1.1.14-8.el6-70404b0
  expected-quorum-votes: 3
  have-watchdog: false
  last-lrm-refresh: 1484068350
  maintenance-mode: false
  no-quorum-policy: ignore
  pe-error-series-max: 1000
  pe-input-series-max: 1000
  pe-warn-series-max: 1000
  stonith-action: reboot
  stonith-enabled: false
  symmetric-cluster: false

Thank you.

-- 
Regards Denis Gribkov
</pre>
      </blockquote>
      <pre wrap="">




_______________________________________________
Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>
<a class="moz-txt-link-freetext" href="http://lists.clusterlabs.org/mailman/listinfo/users">http://lists.clusterlabs.org/mailman/listinfo/users</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>
Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>
Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Regards Denis Gribkov</pre>
  </body>
</html>