[ClusterLabs] Redundant ring not recovering after node is back

Wed Aug 22 16:21:01 EDT 2018

Sorry a Typo

I think you are mixing interface with nodelist http://clusterlabs.org/
pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_
Scratch/_sample_corosync_configuration.html

2018-08-22 22:20 GMT+02:00 Emmanuel Gelati <emi2fast at gmail.com>:

> I think you are missing interface with nodelist http://clusterlabs.org/
> pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_
> Scratch/_sample_corosync_configuration.html
>
> 2018-08-22 14:53 GMT+02:00 David Tolosa <david.tolosa at upcnet.es>:
>
>> Hello,
>> Im getting crazy about this problem, that I expect to resolve here, with
>> your help guys:
>>
>> I have 2 nodes with Corosync redundant ring feature.
>>
>> Each node has 2 similarly connected/configured NIC's. Both nodes are
>> connected each other by two crossover cables.
>>
>> I configured both nodes with rrp mode passive. Everything is working well
>> at this point, but when I shutdown 1 node to test failover, and this node
>> returns to be online, corosync is marking the interface as FAULTY and rrp
>> fails to recover the initial state:
>>
>> 1. Initial scenario:
>>
>> # corosync-cfgtool -s
>> Printing ring status.
>> Local node ID 1
>> RING ID 0
>>         id      = 192.168.0.1
>>         status  = ring 0 active with no faults
>> RING ID 1
>>         id      = 192.168.1.1
>>         status  = ring 1 active with no faults
>>
>>
>> 2. When I shutdown the node 2, all continues with no faults. Sometimes
>> the ring ID's are bonding with 127.0.0.1 and then bond back to their
>> respective heartbeat IP.
>>
>> 3. When node 2 is back online:
>>
>> # corosync-cfgtool -s
>> Printing ring status.
>> Local node ID 1
>> RING ID 0
>>         id      = 192.168.0.1
>>         status  = ring 0 active with no faults
>> RING ID 1
>>         id      = 192.168.1.1
>>         status  = Marking ringid 1 interface 192.168.1.1 FAULTY
>>
>>
>> # service corosync status
>> ● corosync.service - Corosync Cluster Engine
>>    Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor
>> preset: enabled)
>>    Active: active (running) since Wed 2018-08-22 14:44:09 CEST; 1min 38s
>> ago
>>      Docs: man:corosync
>>            man:corosync.conf
>>            man:corosync_overview
>>  Main PID: 1439 (corosync)
>>     Tasks: 2 (limit: 4915)
>>    CGroup: /system.slice/corosync.service
>>            └─1439 /usr/sbin/corosync -f
>>
>>
>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice  [TOTEM ]
>> The network interface [192.168.0.1] is now up.
>> Aug 22 14:44:11 node1 corosync[1439]:   [TOTEM ] The network interface
>> [192.168.0.1] is now up.
>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice  [TOTEM ]
>> The network interface [192.168.1.1] is now up.
>> Aug 22 14:44:11 node1 corosync[1439]:   [TOTEM ] The network interface
>> [192.168.1.1] is now up.
>> Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice  [TOTEM ] A
>> new membership (192.168.0.1:601760) was formed. Members
>> Aug 22 14:44:26 node1 corosync[1439]:   [TOTEM ] A new membership (
>> 192.168.0.1:601760) was formed. Members
>> Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice  [TOTEM ] A
>> new membership (192.168.0.1:601764) was formed. Members joined: 2
>> Aug 22 14:44:32 node1 corosync[1439]:   [TOTEM ] A new membership (
>> 192.168.0.1:601764) was formed. Members joined: 2
>> Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error   [TOTEM ]
>> Marking ringid 1 interface 192.168.1.1 FAULTY
>> Aug 22 14:44:34 node1 corosync[1439]:   [TOTEM ] Marking ringid 1
>> interface 192.168.1.1 FAULTY
>>
>>
>> If I execute corosync-cfgtool, clears the faulty error but after some
>> seconds return to be FAULTY.
>> The only thing that it resolves the problem is to restart de service with
>> service corosync restart.
>>
>> Here you have some of my configuration settings on node 1 (I probed
>> already to change rrp_mode):
>>
>> *- corosync.conf*
>>
>> totem {
>>         version: 2
>>         cluster_name: node
>>         token: 5000
>>         token_retransmits_before_loss_const: 10
>>         secauth: off
>>         threads: 0
>>         rrp_mode: passive
>>         nodeid: 1
>>         interface {
>>                 ringnumber: 0
>>                 bindnetaddr: 192.168.0.0
>>                 #mcastaddr: 226.94.1.1
>>                 mcastport: 5405
>>                 broadcast: yes
>>         }
>>         interface {
>>                 ringnumber: 1
>>                 bindnetaddr: 192.168.1.0
>>                 #mcastaddr: 226.94.1.2
>>                 mcastport: 5407
>>                 broadcast: yes
>>         }
>> }
>>
>> logging {
>>         fileline: off
>>         to_stderr: yes
>>         to_syslog: yes
>>         to_logfile: yes
>>         logfile: /var/log/corosync/corosync.log
>>         debug: off
>>         timestamp: on
>>         logger_subsys {
>>                 subsys: AMF
>>                 debug: off
>>         }
>> }
>>
>> amf {
>>         mode: disabled
>> }
>>
>> quorum {
>>         provider: corosync_votequorum
>>         expected_votes: 2
>> }
>>
>> nodelist {
>>         node {
>>                 nodeid: 1
>>                 ring0_addr: 192.168.0.1
>>                 ring1_addr: 192.168.1.1
>>         }
>>
>>         node {
>>                 nodeid: 2
>>                 ring0_addr: 192.168.0.2
>>                 ring1_addr: 192.168.1.2
>>         }
>> }
>>
>> aisexec {
>>         user: root
>>         group: root
>> }
>>
>> service {
>>         name: pacemaker
>>         ver: 1
>> }
>>
>>
>>
>> *- /etc/hosts*
>>
>>
>> 127.0.0.1       localhost
>> 10.4.172.5      node1.upc.edu node1
>> 10.4.172.6      node2.upc.edu node2
>>
>>
>> Thank you for you help in advance!
>>
>> --
>> *David Tolosa Martínez*
>> Customer Support & Infrastructure
>> UPCnet - Edifici Vèrtex
>> Plaça d'Eusebi Güell, 6, 08034 Barcelona
>> Tel: 934054555
>>
>> <https://www.upcnet.es>
>>
>>
>>
>> INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES:
>>
>>
>>
>> Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat
>> Politècnica de Catalunya, SLU   | Finalitat: gestionar els contactes i
>> les relacions professionals i comercials amb els nostres clients i
>> proveïdors   | Base legal: consentiment, interès legítim i/o relació
>> contractual   | Destinataris: no seran comunicades a tercers excepte per
>> obligació legal   | Drets: pots exercir els teus drets d’accés,
>> rectificació i supressió, així com els altres drets reconeguts a la
>> normativa vigent, enviant-nos un missatge a privacy at upcnet.es   | Més
>> informació: consulta la nostra política completa de protecció de dades
>> <https://www.upcnet.es/politica-de-privacitat>.
>>
>>
>>
>> AVÍS DE CONFIDENCIALITAT
>> <https://www.upcnet.es/ca/avis-de-confidencialitat>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> --
>   .~.
>   /V\
>  //  \\
> /(   )\
> ^`~'^
>


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180822/c9b430c9/attachment-0002.html>