[ClusterLabs] Redundant ring not recovering after node is back

Wed Aug 22 16:20:21 EDT 2018

I think you are missing interface with nodelist
http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_sample_corosync_configuration.html

2018-08-22 14:53 GMT+02:00 David Tolosa <david.tolosa at upcnet.es>:

> Hello,
> Im getting crazy about this problem, that I expect to resolve here, with
> your help guys:
>
> I have 2 nodes with Corosync redundant ring feature.
>
> Each node has 2 similarly connected/configured NIC's. Both nodes are
> connected each other by two crossover cables.
>
> I configured both nodes with rrp mode passive. Everything is working well
> at this point, but when I shutdown 1 node to test failover, and this node
> returns to be online, corosync is marking the interface as FAULTY and rrp
> fails to recover the initial state:
>
> 1. Initial scenario:
>
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
>         id      = 192.168.0.1
>         status  = ring 0 active with no faults
> RING ID 1
>         id      = 192.168.1.1
>         status  = ring 1 active with no faults
>
>
> 2. When I shutdown the node 2, all continues with no faults. Sometimes the
> ring ID's are bonding with 127.0.0.1 and then bond back to their respective
> heartbeat IP.
>
> 3. When node 2 is back online:
>
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
>         id      = 192.168.0.1
>         status  = ring 0 active with no faults
> RING ID 1
>         id      = 192.168.1.1
>         status  = Marking ringid 1 interface 192.168.1.1 FAULTY
>
>
> # service corosync status
> ● corosync.service - Corosync Cluster Engine
>    Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor
> preset: enabled)
>    Active: active (running) since Wed 2018-08-22 14:44:09 CEST; 1min 38s
> ago
>      Docs: man:corosync
>            man:corosync.conf
>            man:corosync_overview
>  Main PID: 1439 (corosync)
>     Tasks: 2 (limit: 4915)
>    CGroup: /system.slice/corosync.service
>            └─1439 /usr/sbin/corosync -f
>
>
> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice  [TOTEM ] The
> network interface [192.168.0.1] is now up.
> Aug 22 14:44:11 node1 corosync[1439]:   [TOTEM ] The network interface
> [192.168.0.1] is now up.
> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice  [TOTEM ] The
> network interface [192.168.1.1] is now up.
> Aug 22 14:44:11 node1 corosync[1439]:   [TOTEM ] The network interface
> [192.168.1.1] is now up.
> Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice  [TOTEM ] A
> new membership (192.168.0.1:601760) was formed. Members
> Aug 22 14:44:26 node1 corosync[1439]:   [TOTEM ] A new membership (
> 192.168.0.1:601760) was formed. Members
> Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice  [TOTEM ] A
> new membership (192.168.0.1:601764) was formed. Members joined: 2
> Aug 22 14:44:32 node1 corosync[1439]:   [TOTEM ] A new membership (
> 192.168.0.1:601764) was formed. Members joined: 2
> Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error   [TOTEM ]
> Marking ringid 1 interface 192.168.1.1 FAULTY
> Aug 22 14:44:34 node1 corosync[1439]:   [TOTEM ] Marking ringid 1
> interface 192.168.1.1 FAULTY
>
>
> If I execute corosync-cfgtool, clears the faulty error but after some
> seconds return to be FAULTY.
> The only thing that it resolves the problem is to restart de service with
> service corosync restart.
>
> Here you have some of my configuration settings on node 1 (I probed
> already to change rrp_mode):
>
> *- corosync.conf*
>
> totem {
>         version: 2
>         cluster_name: node
>         token: 5000
>         token_retransmits_before_loss_const: 10
>         secauth: off
>         threads: 0
>         rrp_mode: passive
>         nodeid: 1
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 192.168.0.0
>                 #mcastaddr: 226.94.1.1
>                 mcastport: 5405
>                 broadcast: yes
>         }
>         interface {
>                 ringnumber: 1
>                 bindnetaddr: 192.168.1.0
>                 #mcastaddr: 226.94.1.2
>                 mcastport: 5407
>                 broadcast: yes
>         }
> }
>
> logging {
>         fileline: off
>         to_stderr: yes
>         to_syslog: yes
>         to_logfile: yes
>         logfile: /var/log/corosync/corosync.log
>         debug: off
>         timestamp: on
>         logger_subsys {
>                 subsys: AMF
>                 debug: off
>         }
> }
>
> amf {
>         mode: disabled
> }
>
> quorum {
>         provider: corosync_votequorum
>         expected_votes: 2
> }
>
> nodelist {
>         node {
>                 nodeid: 1
>                 ring0_addr: 192.168.0.1
>                 ring1_addr: 192.168.1.1
>         }
>
>         node {
>                 nodeid: 2
>                 ring0_addr: 192.168.0.2
>                 ring1_addr: 192.168.1.2
>         }
> }
>
> aisexec {
>         user: root
>         group: root
> }
>
> service {
>         name: pacemaker
>         ver: 1
> }
>
>
>
> *- /etc/hosts*
>
>
> 127.0.0.1       localhost
> 10.4.172.5      node1.upc.edu node1
> 10.4.172.6      node2.upc.edu node2
>
>
> Thank you for you help in advance!
>
> --
> *David Tolosa Martínez*
> Customer Support & Infrastructure
> UPCnet - Edifici Vèrtex
> Plaça d'Eusebi Güell, 6, 08034 Barcelona
> Tel: 934054555
>
> <https://www.upcnet.es>
>
>
>
> INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES:
>
>
>
> Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat
> Politècnica de Catalunya, SLU   | Finalitat: gestionar els contactes i
> les relacions professionals i comercials amb els nostres clients i
> proveïdors   | Base legal: consentiment, interès legítim i/o relació
> contractual   | Destinataris: no seran comunicades a tercers excepte per
> obligació legal   | Drets: pots exercir els teus drets d’accés,
> rectificació i supressió, així com els altres drets reconeguts a la
> normativa vigent, enviant-nos un missatge a privacy at upcnet.es   | Més
> informació: consulta la nostra política completa de protecció de dades
> <https://www.upcnet.es/politica-de-privacitat>.
>
>
>
> AVÍS DE CONFIDENCIALITAT
> <https://www.upcnet.es/ca/avis-de-confidencialitat>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180822/a676b244/attachment-0002.html>