[ClusterLabs] getting "Totem is unable to form a cluster" error
Jan Friesse
jfriesse at redhat.com
Mon Apr 11 08:23:03 CEST 2016
> 08.04.2016 17:51, Jan Friesse пишет:
>>> On 04/08/16 13:01, Jan Friesse wrote:
>>> >> pacemaker 1.1.12-11.12
>>> >> openais 1.1.4-5.24.5
>>> >> corosync 1.4.7-0.23.5
>>> >>
>>> >> Its a two node active/passive cluster and we just upgraded the
>>> SLES 11
>>> >> SP 3 to SLES 11 SP 4(nothing else) but when we try to start the
>>> cluster
>>> >> service we get the following error:
>>> >>
>>> >> "Totem is unable to form a cluster because of an operating system or
>>> >> network fault."
>>> >>
>>> >> Firewall is stopped and disabled on both the nodes. Both nodes can
>>> >> ping/ssh/vnc each other.
>>> >
>>> > Hard to help. First of all, I would recommend to ask SUSE support
>>> because I don't really have access to source code of corosync
>>> 1.4.7-0.23.5 package, so really don't know what patches are added.
>>> >
>>> >
>>> Yup, ticket opened with SUSE Support.
>>>
>>> >>
>>> >>
>>> >>
>>> >> /var/log/messages:
>>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Corosync Cluster
>>> Engine
>>> >> ('1.4.7'): started and ready to provide service.
>>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Corosync built-in
>>> >> features: nss
>>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Successfully
>>> configured
>>> >> openais services to load
>>> >> Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Successfully read main
>>> >> configuration file '/etc/corosync/corosync.conf'.
>>> >> Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] Initializing transport
>>> >> (UDP/IP Unicast).
>>> >> Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] Initializing
>>> >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>>> >> Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] The network
>>> interface is
>>> >> down.
>>> >
>>> > ^^^ This is important line. It means corosync was unable to find
>>> interface for bindnetaddr 192.168.150.0. Make sure interface with this
>>> network address exists.
>>> >
>>> >
>>> this machine has two IP address assigned on interface bond0
>>>
>>> bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
>>> link/ether 74:e6:e2:73:e5:61 brd ff:ff:ff:ff:ff:ff
>>> inet 10.150.20.91/24 brd 10.150.20.55 scope global bond0
>>> inet 192.168.150.12/22 brd 192.168.151.255 scope global
>>> bond0:cluster
>>> inet6 fe80::76e6:e2ff:fe73:e561/64 scope link
>>> valid_lft forever preferred_lft forever
>>
>> This is ifconfig output? I'm just wondering how you were able to set two
>> ipv4 addresses (in this format, I would expect another interface like
>> bond0:1 or nothing at all)?
>>
>
> That is how Linux stack works for the last 10 or 15 years. The bond0:1
> is legacy emulation for ifconfig addicts.
>
> ip addr add 10.150.20.91/24 dev bond0
Hmm.
RHEL 6:
# tunctl -p
Set 'tap0' persistent and owned by uid 0
# ip addr add 192.168.7.1/24 dev tap0
# ip addr add 192.168.8.1/24 dev tap0
# ifconfig tap0
tap0 Link encap:Ethernet HWaddr 22:95:B1:85:67:3F
inet addr:192.168.7.1 Bcast:0.0.0.0 Mask:255.255.255.0
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
RHEL 7:
# ip tuntap add dev tap0 mode tap
# ip addr add 192.168.7.1/24 dev tap0
# ip addr add 192.168.8.1/24 dev tap0
# ifconfig tap0
tap0: flags=4098<BROADCAST,MULTICAST> mtu 1500
inet 192.168.7.1 netmask 255.255.255.0 broadcast 0.0.0.0
ether 36:02:5c:ff:29:ea txqueuelen 500 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
So where do you see 192.168.8.1 in ifconfig output?
>
>> Anyway, I was trying to create bonding interface and set second ipv4
>> (via ip addr) and corosync (flatiron what is 1.4.8 + 4 for your problem
>> completely unrelated patches) was able to detect it without any problem.
>>
>> I can recommend you to try:
>> - Set bindnetaddr to IP address of given node (so you have to change
>> bindnetaddr on both nodes)
>> - Try upstream corosync 1.4.8/flatiron
>>
>> Regards,
>> Honza
>>
>>>
>>> And I can ping 192.168.150.12 from this machine and from other machines
>>> on network
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Muhammad Sharfuddin
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list