[ClusterLabs] Redudant Ring Network failure

ROHWEDER-NEUBECK, MICHAEL (EXTERN) michael.rohweder-neubeck.sp at dlh.de
Wed Jun 10 05:46:13 EDT 2020


Jan,

actually we using this.

[root at lvm-nfscpdata-05ct::~ 100 ]# apt show corosync
Package: corosync
Version: 3.0.1-2+deb10u1

[root at lvm-nfscpdata-05ct::~]# apt show libknet1
Package: libknet1
Version: 1.8-2

This are the newest version provided on Mirror.




Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa Aktiengesellschaft, Koeln, Registereintragung / Registration: Amtsgericht Koeln HR B 2168
Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr. Karl-Ludwig Kley
Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef Kayser, Dr. Michael Niggemann


-----Ursprüngliche Nachricht-----
Von: Jan Friesse <jfriesse at redhat.com> 
Gesendet: Mittwoch, 10. Juni 2020 09:24
An: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>; ROHWEDER-NEUBECK, MICHAEL (EXTERN) <michael.rohweder-neubeck.sp at dlh.de>; users at lists.clusterlabs.org
Betreff: Re: [ClusterLabs] Redudant Ring Network failure

Michael,
what version of knet you are using? We had quite a few problems with older versions of knet, so current stable is recommended (1.16). Same applies for corosync because 3.0.4 has vastly improved display of links status.

> Hello,
> We have massive problems with the redundant ring operation of our Corosync / pacemaker 3 Node NFS clusters.
> 
> Most of the nodes either have an entire ring offline or only 1 node in a ring.
> Example: (Node1 Ring0 333 Ring1 n33 | Node2 Ring0 033 Ring1 3n3 | 
> Node3 Ring0 333 Ring 1 33n)

Doesn't seem completely wrong. You can ignore 'n' for ring 1, because that is localhost which is connected only on Ring 0 (3.0.4 has this output more consistent) so all nodes are connected at least via Ring 1. 
Ring 0 on node 2 seems to have some trouble with connection to node 1 but node 1 (and 3) seems to be connected to node 2 just fine, so I think it is ether some bug in knet (probably already fixed) or some kind of firewall blocking just connection from node 2 to node 1 on ring 0.


> 
> corosync-cfgtool -R don't help
> All nodes are VMs that build the ring together using 2 VLANs.
> Which logs do you need to hopefully help me?

syslog/journal should contain everything needed especially when debug is enabled (corosync.conf - logging.debug: on)

Regards,
   Honza

> 
> Corosync Cluster Engine, version '3.0.1'
> Copyright (c) 2006-2018 Red Hat, Inc.
> Debian Buster
> 
> 
> --
> Mit freundlichen Grüßen
>    Michael Rohweder-Neubeck
> 
> NSB GmbH – Nguyen Softwareentwicklung & Beratung GmbH Röntgenstraße 27
> D-64291 Darmstadt
> E-Mail: 
> mrn at nsb-software.de<mailto:mrn at nsb-software.de<mailto:mrn at nsb-software
> .de%3cmailto:mrn at nsb-software.de>>
> Manager: Van-Hien Nguyen, Jörg Jaspert
> USt-ID: DE 195 703 354; HRB 7131 Amtsgericht Darmstadt
> 
> 
> 
> 
> Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
> Aktiengesellschaft, Koeln, Registereintragung / Registration: 
> Amtsgericht Koeln HR B 2168 Vorsitzender des Aufsichtsrats / Chairman 
> of the Supervisory Board: Dr. Karl-Ludwig Kley Vorstand / Executive 
> Board: Carsten Spohr (Vorsitzender / Chairman), Thorsten Dirks, 
> Christina Foerster, Harry Hohmeister, Dr. Detlef Kayser, Dr. Michael 
> Niggemann
> 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list