[ClusterLabs] Redudant Ring Network failure

ROHWEDER-NEUBECK, MICHAEL (EXTERN) michael.rohweder-neubeck.sp at dlh.de
Wed Jun 10 05:28:30 EDT 2020


Hi,
yesterday we restart all cluster and all rings ok.
Now today 1. With broken ring.

ring 0 broken: 033

this is my cfg

[root at lvm-nfscpdata-05ct::~]# less /etc/corosync/corosync.conf
totem {
  version:                             2
  transport:                           knet
  cluster_name:                        nfscpdata
  token:                               2000
  token_retransmits_before_loss_const: 10
  max_messages:                        150
  window_size:                         300
  crypto_cipher:                       aes256
  crypto_hash:                         sha256
  interface {
    ringnumber: 0
  }
  interface {
    ringnumber: 1
  }
}

logging {
  fileline:        off
  to_stderr:       yes
  to_logfile:      no
  to_syslog:       yes
  syslog_facility: daemon
  syslog_priority: info
  debug:           off
  timestamp:       on
  logger_subsys {
    subsys: QUORUM
    debug:  off
  }
}

quorum {
  # Enable and configure quorum subsystem (default: off)
  # see also corosync.conf.5 and votequorum.5
  provider: corosync_votequorum
}

nodelist {
  node {
    ring0_addr: 10.28.63.138
    ring1_addr: 10.28.98.138
    name: lvm-nfscpdata-04ct
    nodeid: 1688
  }
  node {
    ring0_addr: 10.28.63.139
    ring1_addr: 10.28.98.139
    name: lvm-nfscpdata-05ct
    nodeid: 1689
  }
  node {
    ring0_addr: 10.28.63.140
    ring1_addr: 10.28.98.140
    name: lvm-nfscpdata-06ct
    nodeid: 1690
  }
}

Ring 1 managed by host firewall. But ports opend
Ring 0 no Firewall setting.




Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa Aktiengesellschaft, Koeln, Registereintragung / Registration: Amtsgericht Koeln HR B 2168
Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr. Karl-Ludwig Kley
Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef Kayser, Dr. Michael Niggemann


-----Ursprüngliche Nachricht-----
Von: Strahil Nikolov <hunter86_bg at yahoo.com> 
Gesendet: Dienstag, 9. Juni 2020 21:34
An: ROHWEDER-NEUBECK, MICHAEL (EXTERN) <michael.rohweder-neubeck.sp at dlh.de>; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Betreff: Re: [ClusterLabs] Redudant Ring Network failure

It  will  be hard to guess if you are  using sctp or udp/udpu.
If possible  share  the corosync.conf  (you can remove sensitive data,  but  make it meaningful).

Are you using a firewall ? If yes  check :
1. Node firewall is not blocking  the communication on the specific  interfaces 2. Verify with tcpdump that the heartbeats are received from the remote side.
3. Check for retransmissions or packet loss.

Usually you can find more details in the log specified in corosync.conf or in /var/log/messages (and also the journal).

Best Regards,
Strahil Nikolov

На 9 юни 2020 г. 21:11:02 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL (EXTERN)" <michael.rohweder-neubeck.sp at dlh.de> написа:
>Hi,
>
>we are using unicast ("knet")
>
>Greetings
>
>Michael
>
>
>
>
>Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
>Aktiengesellschaft, Koeln, Registereintragung / Registration:
>Amtsgericht Koeln HR B 2168
>Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr.
>Karl-Ludwig Kley
>Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), 
>Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef 
>Kayser, Dr. Michael Niggemann
>
>
>-----Ursprüngliche Nachricht-----
>Von: Strahil Nikolov <hunter86_bg at yahoo.com>
>Gesendet: Dienstag, 9. Juni 2020 19:30
>An: Cluster Labs - All topics related to open-source clustering 
>welcomed <users at clusterlabs.org>; ROHWEDER-NEUBECK, MICHAEL (EXTERN) 
><michael.rohweder-neubeck.sp at dlh.de>
>Betreff: Re: [ClusterLabs] Redudant Ring Network failure
>
>Are you using multicast ?
>
>Best Regards,
>Strahil Nikolov
>
>На 9 юни 2020 г. 10:28:25 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL 
>(EXTERN)" <michael.rohweder-neubeck.sp at dlh.de> написа:
>>Hello,
>>We have massive problems with the redundant ring operation of our 
>>Corosync / pacemaker 3 Node NFS clusters.
>>
>>Most of the nodes either have an entire ring offline or only 1 node in
>
>>a ring.
>>Example: (Node1 Ring0 333 Ring1 n33 | Node2 Ring0 033 Ring1 3n3 |
>Node3
>>Ring0 333 Ring 1 33n)
>>
>>corosync-cfgtool -R don't help
>>All nodes are VMs that build the ring together using 2 VLANs.
>>Which logs do you need to hopefully help me?
>>
>>Corosync Cluster Engine, version '3.0.1'
>>Copyright (c) 2006-2018 Red Hat, Inc.
>>Debian Buster
>>
>>
>>--
>>Mit freundlichen Grüßen
>>  Michael Rohweder-Neubeck
>>
>>NSB GmbH – Nguyen Softwareentwicklung & Beratung GmbH Röntgenstraße 27
>>D-64291 Darmstadt
>>E-Mail:
>>mrn at nsb-software.de<mailto:mrn at nsb-software.de<mailto:mrn at nsb-software.
>>de%3cmailto:mrn at nsb-software.de>>
>>Manager: Van-Hien Nguyen, Jörg Jaspert
>>USt-ID: DE 195 703 354; HRB 7131 Amtsgericht Darmstadt
>>
>>
>>
>>
>>Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
>>Aktiengesellschaft, Koeln, Registereintragung / Registration:
>>Amtsgericht Koeln HR B 2168
>>Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board:
>Dr.
>>Karl-Ludwig Kley
>>Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), 
>>Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef 
>>Kayser, Dr. Michael Niggemann


More information about the Users mailing list