<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><tt>Hi Everyone.</tt></p>

    <p><tt>I have 16-nodes asynchronous cluster configured with Corosync

        redundant ring feature.</tt></p>

    <p><tt>Each node has 2 similarly connected/configured NIC's. One NIC

        connected to the public network,</tt></p>

    <p><tt>another one to our private VLAN. When I checked Corosync

        rings operability I found:</tt><tt><br>

      </tt></p>

    <p><tt># corosync-cfgtool -s</tt><tt><br>

      </tt><tt>Printing ring status.</tt><tt><br>

      </tt><tt>Local node ID 1</tt><tt><br>

      </tt><tt>RING ID 0</tt><tt><br>

      </tt><tt>        id      = 192.168.1.54</tt><tt><br>

      </tt><tt>        status  = Marking ringid 0 interface 192.168.1.54

        FAULTY</tt><tt><br>

      </tt><tt>RING ID 1</tt><tt><br>

      </tt><tt>        id      = 111.11.11.1</tt><tt><br>

      </tt><tt>        status  = ring 1 active with no faults</tt></p>

    <p><tt>After some time of digging into I </tt><tt><span

          id="result_box" class="short_text" lang="en"><span class="">identified</span></span>

        that if I enable back the failed ring with command:</tt></p>

    <p><tt> # corosync-cfgtool -r</tt></p>

    <p><tt>RING ID 0 will be marked as "active" for few minutes, but

        after it marked permanently as faulty.</tt></p>

    <p><tt>Log has no any useful info, just single message:</tt></p>

    <p><tt>corosync[21740]:   [TOTEM ] Marking ringid 0 interface

        192.168.1.54 FAULTY</tt></p>

    <p><tt>And no any message like:</tt></p>

    <p><tt>[TOTEM ] Automatically recovered ring 1</tt></p>

    <p><tt><br>

      </tt></p>

    <p><tt>My corosync.conf looks like:</tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>compatibility: whitetank

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>totem {

      </tt><tt><br>

      </tt><tt>        version: 2

      </tt><tt><br>

      </tt><tt>        secauth: on

      </tt><tt><br>

      </tt><tt>        threads: 4

      </tt><tt><br>

      </tt><tt>        rrp_mode: passive</tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>        interface {

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>                member {

      </tt><tt><br>

      </tt><tt>                        memberaddr: PRIVATE_IP_1

      </tt><tt><br>

      </tt><tt>                }

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>...

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>                member {

      </tt><tt><br>

      </tt><tt>                        memberaddr: PRIVATE_IP_16</tt><tt><br>

      </tt><tt>                }

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>                ringnumber: 0

      </tt><tt><br>

      </tt><tt>                bindnetaddr: PRIVATE_NET_ADDR

      </tt><tt><br>

      </tt><tt>                mcastaddr: 226.0.0.1

      </tt><tt><br>

      </tt><tt>                mcastport: 5505</tt><tt><br>

      </tt><tt>                ttl: 1

      </tt><tt><br>

      </tt><tt>        }

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>       interface {

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>                member {

      </tt><tt><br>

      </tt><tt>                        memberaddr: PUBLIC_IP_1

      </tt><tt><br>

      </tt><tt>                }

      </tt><tt><br>

      </tt><tt>...

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>                member {

      </tt><tt><br>

      </tt><tt>                        memberaddr: PUBLIC_IP_16</tt><tt><br>

      </tt><tt>                }

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>                ringnumber: 1

      </tt><tt><br>

      </tt><tt>                bindnetaddr: PUBLIC_NET_ADDR

      </tt><tt><br>

      </tt><tt>                mcastaddr: 224.0.0.1

      </tt><tt><br>

      </tt><tt>                mcastport: 5405

      </tt><tt><br>

      </tt><tt>                ttl: 1

      </tt><tt><br>

      </tt><tt>        }

      </tt><tt><br>

      </tt>

      <tt><br>

      </tt><tt>        transport: udpu </tt><tt><br>

      </tt></p>

    <p><tt>logging {</tt><tt><br>

      </tt><tt>        to_stderr: no</tt><tt><br>

      </tt><tt>        to_logfile: yes</tt><tt><br>

      </tt><tt>        logfile: /var/log/cluster/corosync.log</tt><tt><br>

      </tt><tt>        logfile_priority: info</tt><tt><br>

      </tt><tt>        to_syslog: yes</tt><tt><br>

      </tt><tt>        syslog_priority: warning</tt><tt><br>

      </tt><tt>        debug: on</tt><tt><br>

      </tt><tt>        timestamp: on</tt><tt><br>

      </tt><tt>}</tt></p>

    <p><tt>I had tried to change rrp_mode, mcastaddr/mcastport for

        ringnumber: 0, but result was the similar.</tt></p>

    <p><tt>I checked multicast/unicast operability using omping utility

        and didn't found any issues.</tt><tt><br>

      </tt></p>

    <p><tt>Also no errors on our private VLAN was found for network

        equipment.</tt><tt><br>

      </tt></p>

    <p><tt>Why Corosync decided to disable permanently second ring? How

        I can debug the issue?</tt><tt><br>

      </tt></p>

    <p><tt>Other properties:</tt><tt><br>

      </tt></p>

    <p><tt>Corosync Cluster Engine, version '1.4.7'</tt><tt><br>

      </tt></p>

    <tt>Pacemaker properties:

    </tt><tt><br>

    </tt><tt> cluster-infrastructure: cman

    </tt><tt><br>

    </tt><tt> cluster-recheck-interval: 5min

    </tt><tt><br>

    </tt><tt> dc-version: 1.1.14-8.el6-70404b0

    </tt><tt><br>

    </tt><tt> expected-quorum-votes: 3

    </tt><tt><br>

    </tt><tt> have-watchdog: false

    </tt><tt><br>

    </tt><tt> last-lrm-refresh: 1484068350

    </tt><tt><br>

    </tt><tt> maintenance-mode: false

    </tt><tt><br>

    </tt><tt> no-quorum-policy: ignore

    </tt><tt><br>

    </tt><tt> pe-error-series-max: 1000

    </tt><tt><br>

    </tt><tt> pe-input-series-max: 1000

    </tt><tt><br>

    </tt><tt> pe-warn-series-max: 1000

    </tt><tt><br>

    </tt><tt> stonith-action: reboot

    </tt><tt><br>

    </tt><tt> stonith-enabled: false

    </tt><tt><br>

    </tt><tt> symmetric-cluster: false

    </tt><tt><br>

    </tt><tt>

    </tt><tt><br>

    </tt><tt>Thank you.</tt><br>

    <pre class="moz-signature" cols="72">-- 

Regards Denis Gribkov</pre>

  </body>

</html>