[ClusterLabs] getting "Totem is unable to form a cluster" error
Andrei Borzenkov
arvidjaar at gmail.com
Fri Apr 8 16:51:01 UTC 2016
08.04.2016 17:51, Jan Friesse пишет:
>> On 04/08/16 13:01, Jan Friesse wrote:
>> >> pacemaker 1.1.12-11.12
>> >> openais 1.1.4-5.24.5
>> >> corosync 1.4.7-0.23.5
>> >>
>> >> Its a two node active/passive cluster and we just upgraded the
>> SLES 11
>> >> SP 3 to SLES 11 SP 4(nothing else) but when we try to start the
>> cluster
>> >> service we get the following error:
>> >>
>> >> "Totem is unable to form a cluster because of an operating system or
>> >> network fault."
>> >>
>> >> Firewall is stopped and disabled on both the nodes. Both nodes can
>> >> ping/ssh/vnc each other.
>> >
>> > Hard to help. First of all, I would recommend to ask SUSE support
>> because I don't really have access to source code of corosync
>> 1.4.7-0.23.5 package, so really don't know what patches are added.
>> >
>> >
>> Yup, ticket opened with SUSE Support.
>>
>> >>
>> >>
>> >>
>> >> /var/log/messages:
>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Corosync Cluster
>> Engine
>> >> ('1.4.7'): started and ready to provide service.
>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Corosync built-in
>> >> features: nss
>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Successfully
>> configured
>> >> openais services to load
>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Successfully read main
>> >> configuration file '/etc/corosync/corosync.conf'.
>> >> Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] Initializing transport
>> >> (UDP/IP Unicast).
>> >> Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] Initializing
>> >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> >> Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] The network
>> interface is
>> >> down.
>> >
>> > ^^^ This is important line. It means corosync was unable to find
>> interface for bindnetaddr 192.168.150.0. Make sure interface with this
>> network address exists.
>> >
>> >
>> this machine has two IP address assigned on interface bond0
>>
>> bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
>> link/ether 74:e6:e2:73:e5:61 brd ff:ff:ff:ff:ff:ff
>> inet 10.150.20.91/24 brd 10.150.20.55 scope global bond0
>> inet 192.168.150.12/22 brd 192.168.151.255 scope global
>> bond0:cluster
>> inet6 fe80::76e6:e2ff:fe73:e561/64 scope link
>> valid_lft forever preferred_lft forever
>
> This is ifconfig output? I'm just wondering how you were able to set two
> ipv4 addresses (in this format, I would expect another interface like
> bond0:1 or nothing at all)?
>
That is how Linux stack works for the last 10 or 15 years. The bond0:1
is legacy emulation for ifconfig addicts.
ip addr add 10.150.20.91/24 dev bond0
> Anyway, I was trying to create bonding interface and set second ipv4
> (via ip addr) and corosync (flatiron what is 1.4.8 + 4 for your problem
> completely unrelated patches) was able to detect it without any problem.
>
> I can recommend you to try:
> - Set bindnetaddr to IP address of given node (so you have to change
> bindnetaddr on both nodes)
> - Try upstream corosync 1.4.8/flatiron
>
> Regards,
> Honza
>
>>
>> And I can ping 192.168.150.12 from this machine and from other machines
>> on network
>>
>>
>>
>> --
>> Regards,
>>
>> Muhammad Sharfuddin
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list