[ClusterLabs] Redundant ring not recovering after node is back
Emmanuel Gelati
emi2fast at gmail.com
Wed Aug 22 16:21:01 EDT 2018
Sorry a Typo
I think you are mixing interface with nodelist http://clusterlabs.org/
pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_
Scratch/_sample_corosync_configuration.html
2018-08-22 22:20 GMT+02:00 Emmanuel Gelati <emi2fast at gmail.com>:
> I think you are missing interface with nodelist http://clusterlabs.org/
> pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_
> Scratch/_sample_corosync_configuration.html
>
> 2018-08-22 14:53 GMT+02:00 David Tolosa <david.tolosa at upcnet.es>:
>
>> Hello,
>> Im getting crazy about this problem, that I expect to resolve here, with
>> your help guys:
>>
>> I have 2 nodes with Corosync redundant ring feature.
>>
>> Each node has 2 similarly connected/configured NIC's. Both nodes are
>> connected each other by two crossover cables.
>>
>> I configured both nodes with rrp mode passive. Everything is working well
>> at this point, but when I shutdown 1 node to test failover, and this node
>> returns to be online, corosync is marking the interface as FAULTY and rrp
>> fails to recover the initial state:
>>
>> 1. Initial scenario:
>>
>> # corosync-cfgtool -s
>> Printing ring status.
>> Local node ID 1
>> RING ID 0
>> id = 192.168.0.1
>> status = ring 0 active with no faults
>> RING ID 1
>> id = 192.168.1.1
>> status = ring 1 active with no faults
>>
>>
>> 2. When I shutdown the node 2, all continues with no faults. Sometimes
>> the ring ID's are bonding with 127.0.0.1 and then bond back to their
>> respective heartbeat IP.
>>
>> 3. When node 2 is back online:
>>
>> # corosync-cfgtool -s
>> Printing ring status.
>> Local node ID 1
>> RING ID 0
>> id = 192.168.0.1
>> status = ring 0 active with no faults
>> RING ID 1
>> id = 192.168.1.1
>> status = Marking ringid 1 interface 192.168.1.1 FAULTY
>>
>>
>> # service corosync status
>> ● corosync.service - Corosync Cluster Engine
>> Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor
>> preset: enabled)
>> Active: active (running) since Wed 2018-08-22 14:44:09 CEST; 1min 38s
>> ago
>> Docs: man:corosync
>> man:corosync.conf
>> man:corosync_overview
>> Main PID: 1439 (corosync)
>> Tasks: 2 (limit: 4915)
>> CGroup: /system.slice/corosync.service
>> └─1439 /usr/sbin/corosync -f
>>
>>
>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ]
>> The network interface [192.168.0.1] is now up.
>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface
>> [192.168.0.1] is now up.
>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ]
>> The network interface [192.168.1.1] is now up.
>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface
>> [192.168.1.1] is now up.
>> Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice [TOTEM ] A
>> new membership (192.168.0.1:601760) was formed. Members
>> Aug 22 14:44:26 node1 corosync[1439]: [TOTEM ] A new membership (
>> 192.168.0.1:601760) was formed. Members
>> Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice [TOTEM ] A
>> new membership (192.168.0.1:601764) was formed. Members joined: 2
>> Aug 22 14:44:32 node1 corosync[1439]: [TOTEM ] A new membership (
>> 192.168.0.1:601764) was formed. Members joined: 2
>> Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error [TOTEM ]
>> Marking ringid 1 interface 192.168.1.1 FAULTY
>> Aug 22 14:44:34 node1 corosync[1439]: [TOTEM ] Marking ringid 1
>> interface 192.168.1.1 FAULTY
>>
>>
>> If I execute corosync-cfgtool, clears the faulty error but after some
>> seconds return to be FAULTY.
>> The only thing that it resolves the problem is to restart de service with
>> service corosync restart.
>>
>> Here you have some of my configuration settings on node 1 (I probed
>> already to change rrp_mode):
>>
>> *- corosync.conf*
>>
>> totem {
>> version: 2
>> cluster_name: node
>> token: 5000
>> token_retransmits_before_loss_const: 10
>> secauth: off
>> threads: 0
>> rrp_mode: passive
>> nodeid: 1
>> interface {
>> ringnumber: 0
>> bindnetaddr: 192.168.0.0
>> #mcastaddr: 226.94.1.1
>> mcastport: 5405
>> broadcast: yes
>> }
>> interface {
>> ringnumber: 1
>> bindnetaddr: 192.168.1.0
>> #mcastaddr: 226.94.1.2
>> mcastport: 5407
>> broadcast: yes
>> }
>> }
>>
>> logging {
>> fileline: off
>> to_stderr: yes
>> to_syslog: yes
>> to_logfile: yes
>> logfile: /var/log/corosync/corosync.log
>> debug: off
>> timestamp: on
>> logger_subsys {
>> subsys: AMF
>> debug: off
>> }
>> }
>>
>> amf {
>> mode: disabled
>> }
>>
>> quorum {
>> provider: corosync_votequorum
>> expected_votes: 2
>> }
>>
>> nodelist {
>> node {
>> nodeid: 1
>> ring0_addr: 192.168.0.1
>> ring1_addr: 192.168.1.1
>> }
>>
>> node {
>> nodeid: 2
>> ring0_addr: 192.168.0.2
>> ring1_addr: 192.168.1.2
>> }
>> }
>>
>> aisexec {
>> user: root
>> group: root
>> }
>>
>> service {
>> name: pacemaker
>> ver: 1
>> }
>>
>>
>>
>> *- /etc/hosts*
>>
>>
>> 127.0.0.1 localhost
>> 10.4.172.5 node1.upc.edu node1
>> 10.4.172.6 node2.upc.edu node2
>>
>>
>> Thank you for you help in advance!
>>
>> --
>> *David Tolosa Martínez*
>> Customer Support & Infrastructure
>> UPCnet - Edifici Vèrtex
>> Plaça d'Eusebi Güell, 6, 08034 Barcelona
>> Tel: 934054555
>>
>> <https://www.upcnet.es>
>>
>>
>>
>> INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES:
>>
>>
>>
>> Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat
>> Politècnica de Catalunya, SLU | Finalitat: gestionar els contactes i
>> les relacions professionals i comercials amb els nostres clients i
>> proveïdors | Base legal: consentiment, interès legítim i/o relació
>> contractual | Destinataris: no seran comunicades a tercers excepte per
>> obligació legal | Drets: pots exercir els teus drets d’accés,
>> rectificació i supressió, així com els altres drets reconeguts a la
>> normativa vigent, enviant-nos un missatge a privacy at upcnet.es | Més
>> informació: consulta la nostra política completa de protecció de dades
>> <https://www.upcnet.es/politica-de-privacitat>.
>>
>>
>>
>> AVÍS DE CONFIDENCIALITAT
>> <https://www.upcnet.es/ca/avis-de-confidencialitat>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> --
> .~.
> /V\
> // \\
> /( )\
> ^`~'^
>
--
.~.
/V\
// \\
/( )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180822/c9b430c9/attachment-0002.html>
More information about the Users
mailing list