[ClusterLabs] getting "Totem is unable to form a cluster" error

Fri Apr 8 16:51:23 CEST 2016

> On 04/08/16 13:01, Jan Friesse wrote:
>  >> pacemaker 1.1.12-11.12
>  >> openais 1.1.4-5.24.5
>  >> corosync 1.4.7-0.23.5
>  >>
>  >> Its a two node active/passive cluster and we just upgraded the SLES 11
>  >> SP 3 to SLES 11 SP 4(nothing  else) but when we try to start the
> cluster
>  >> service we get the following error:
>  >>
>  >> "Totem is unable to form a cluster because of an operating system or
>  >> network fault."
>  >>
>  >> Firewall is stopped and disabled on both the nodes. Both nodes can
>  >> ping/ssh/vnc each other.
>  >
>  > Hard to help. First of all, I would recommend to ask SUSE support
> because I don't really have access to source code of corosync
> 1.4.7-0.23.5 package, so really don't know what patches are added.
>  >
>  >
> Yup, ticket opened with SUSE Support.
>
>  >>
>  >>
>  >>
>  >> /var/log/messages:
>  >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Corosync Cluster Engine
>  >> ('1.4.7'): started and ready to provide service.
>  >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Corosync built-in
>  >> features: nss
>  >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Successfully configured
>  >> openais services to load
>  >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Successfully read main
>  >> configuration file '/etc/corosync/corosync.conf'.
>  >> Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] Initializing transport
>  >> (UDP/IP Unicast).
>  >> Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] Initializing
>  >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>  >> Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] The network interface is
>  >> down.
>  >
>  > ^^^ This is important line. It means corosync was unable to find
> interface for bindnetaddr 192.168.150.0. Make sure interface with this
> network address exists.
>  >
>  >
> this machine has two IP address assigned on interface bond0
>
> bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
>      link/ether 74:e6:e2:73:e5:61 brd ff:ff:ff:ff:ff:ff
>      inet 10.150.20.91/24 brd 10.150.20.55 scope global bond0
>      inet 192.168.150.12/22 brd 192.168.151.255 scope global bond0:cluster
>      inet6 fe80::76e6:e2ff:fe73:e561/64 scope link
>         valid_lft forever preferred_lft forever

This is ifconfig output? I'm just wondering how you were able to set two 
ipv4 addresses (in this format, I would expect another interface like 
bond0:1 or nothing at all)?

Anyway, I was trying to create bonding interface and set second ipv4 
(via ip addr) and corosync (flatiron what is 1.4.8 + 4 for your problem 
completely unrelated patches) was able to detect it without any problem.

I can recommend you to try:
- Set bindnetaddr to IP address of given node (so you have to change 
bindnetaddr on both nodes)
- Try upstream corosync 1.4.8/flatiron

Regards,
   Honza

>
> And I can ping 192.168.150.12 from this machine and from other machines
> on network
>
>
>
> --
> Regards,
>
> Muhammad Sharfuddin
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org