[ClusterLabs] Redundant ring not recovering after node is back
Emmanuel Gelati
emi2fast at gmail.com
Wed Aug 22 16:20:21 EDT 2018
I think you are missing interface with nodelist
http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_sample_corosync_configuration.html
2018-08-22 14:53 GMT+02:00 David Tolosa <david.tolosa at upcnet.es>:
> Hello,
> Im getting crazy about this problem, that I expect to resolve here, with
> your help guys:
>
> I have 2 nodes with Corosync redundant ring feature.
>
> Each node has 2 similarly connected/configured NIC's. Both nodes are
> connected each other by two crossover cables.
>
> I configured both nodes with rrp mode passive. Everything is working well
> at this point, but when I shutdown 1 node to test failover, and this node
> returns to be online, corosync is marking the interface as FAULTY and rrp
> fails to recover the initial state:
>
> 1. Initial scenario:
>
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
> id = 192.168.0.1
> status = ring 0 active with no faults
> RING ID 1
> id = 192.168.1.1
> status = ring 1 active with no faults
>
>
> 2. When I shutdown the node 2, all continues with no faults. Sometimes the
> ring ID's are bonding with 127.0.0.1 and then bond back to their respective
> heartbeat IP.
>
> 3. When node 2 is back online:
>
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
> id = 192.168.0.1
> status = ring 0 active with no faults
> RING ID 1
> id = 192.168.1.1
> status = Marking ringid 1 interface 192.168.1.1 FAULTY
>
>
> # service corosync status
> ● corosync.service - Corosync Cluster Engine
> Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor
> preset: enabled)
> Active: active (running) since Wed 2018-08-22 14:44:09 CEST; 1min 38s
> ago
> Docs: man:corosync
> man:corosync.conf
> man:corosync_overview
> Main PID: 1439 (corosync)
> Tasks: 2 (limit: 4915)
> CGroup: /system.slice/corosync.service
> └─1439 /usr/sbin/corosync -f
>
>
> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] The
> network interface [192.168.0.1] is now up.
> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface
> [192.168.0.1] is now up.
> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] The
> network interface [192.168.1.1] is now up.
> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface
> [192.168.1.1] is now up.
> Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice [TOTEM ] A
> new membership (192.168.0.1:601760) was formed. Members
> Aug 22 14:44:26 node1 corosync[1439]: [TOTEM ] A new membership (
> 192.168.0.1:601760) was formed. Members
> Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice [TOTEM ] A
> new membership (192.168.0.1:601764) was formed. Members joined: 2
> Aug 22 14:44:32 node1 corosync[1439]: [TOTEM ] A new membership (
> 192.168.0.1:601764) was formed. Members joined: 2
> Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error [TOTEM ]
> Marking ringid 1 interface 192.168.1.1 FAULTY
> Aug 22 14:44:34 node1 corosync[1439]: [TOTEM ] Marking ringid 1
> interface 192.168.1.1 FAULTY
>
>
> If I execute corosync-cfgtool, clears the faulty error but after some
> seconds return to be FAULTY.
> The only thing that it resolves the problem is to restart de service with
> service corosync restart.
>
> Here you have some of my configuration settings on node 1 (I probed
> already to change rrp_mode):
>
> *- corosync.conf*
>
> totem {
> version: 2
> cluster_name: node
> token: 5000
> token_retransmits_before_loss_const: 10
> secauth: off
> threads: 0
> rrp_mode: passive
> nodeid: 1
> interface {
> ringnumber: 0
> bindnetaddr: 192.168.0.0
> #mcastaddr: 226.94.1.1
> mcastport: 5405
> broadcast: yes
> }
> interface {
> ringnumber: 1
> bindnetaddr: 192.168.1.0
> #mcastaddr: 226.94.1.2
> mcastport: 5407
> broadcast: yes
> }
> }
>
> logging {
> fileline: off
> to_stderr: yes
> to_syslog: yes
> to_logfile: yes
> logfile: /var/log/corosync/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> }
> }
>
> amf {
> mode: disabled
> }
>
> quorum {
> provider: corosync_votequorum
> expected_votes: 2
> }
>
> nodelist {
> node {
> nodeid: 1
> ring0_addr: 192.168.0.1
> ring1_addr: 192.168.1.1
> }
>
> node {
> nodeid: 2
> ring0_addr: 192.168.0.2
> ring1_addr: 192.168.1.2
> }
> }
>
> aisexec {
> user: root
> group: root
> }
>
> service {
> name: pacemaker
> ver: 1
> }
>
>
>
> *- /etc/hosts*
>
>
> 127.0.0.1 localhost
> 10.4.172.5 node1.upc.edu node1
> 10.4.172.6 node2.upc.edu node2
>
>
> Thank you for you help in advance!
>
> --
> *David Tolosa Martínez*
> Customer Support & Infrastructure
> UPCnet - Edifici Vèrtex
> Plaça d'Eusebi Güell, 6, 08034 Barcelona
> Tel: 934054555
>
> <https://www.upcnet.es>
>
>
>
> INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES:
>
>
>
> Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat
> Politècnica de Catalunya, SLU | Finalitat: gestionar els contactes i
> les relacions professionals i comercials amb els nostres clients i
> proveïdors | Base legal: consentiment, interès legítim i/o relació
> contractual | Destinataris: no seran comunicades a tercers excepte per
> obligació legal | Drets: pots exercir els teus drets d’accés,
> rectificació i supressió, així com els altres drets reconeguts a la
> normativa vigent, enviant-nos un missatge a privacy at upcnet.es | Més
> informació: consulta la nostra política completa de protecció de dades
> <https://www.upcnet.es/politica-de-privacitat>.
>
>
>
> AVÍS DE CONFIDENCIALITAT
> <https://www.upcnet.es/ca/avis-de-confidencialitat>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
--
.~.
/V\
// \\
/( )\
^`~'^
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180822/a676b244/attachment-0002.html>
More information about the Users
mailing list