[ClusterLabs] getting "Totem is unable to form a cluster" error

Mon Apr 11 06:23:03 UTC 2016

> 08.04.2016 17:51, Jan Friesse пишет:
>>> On 04/08/16 13:01, Jan Friesse wrote:
>>>   >> pacemaker 1.1.12-11.12
>>>   >> openais 1.1.4-5.24.5
>>>   >> corosync 1.4.7-0.23.5
>>>   >>
>>>   >> Its a two node active/passive cluster and we just upgraded the
>>> SLES 11
>>>   >> SP 3 to SLES 11 SP 4(nothing  else) but when we try to start the
>>> cluster
>>>   >> service we get the following error:
>>>   >>
>>>   >> "Totem is unable to form a cluster because of an operating system or
>>>   >> network fault."
>>>   >>
>>>   >> Firewall is stopped and disabled on both the nodes. Both nodes can
>>>   >> ping/ssh/vnc each other.
>>>   >
>>>   > Hard to help. First of all, I would recommend to ask SUSE support
>>> because I don't really have access to source code of corosync
>>> 1.4.7-0.23.5 package, so really don't know what patches are added.
>>>   >
>>>   >
>>> Yup, ticket opened with SUSE Support.
>>>
>>>   >>
>>>   >>
>>>   >>
>>>   >> /var/log/messages:
>>>   >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Corosync Cluster
>>> Engine
>>>   >> ('1.4.7'): started and ready to provide service.
>>>   >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Corosync built-in
>>>   >> features: nss
>>>   >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Successfully
>>> configured
>>>   >> openais services to load
>>>   >> Apr  6 17:51:49 prd1 corosync[8672]:  [MAIN  ] Successfully read main
>>>   >> configuration file '/etc/corosync/corosync.conf'.
>>>   >> Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] Initializing transport
>>>   >> (UDP/IP Unicast).
>>>   >> Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] Initializing
>>>   >> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>>>   >> Apr  6 17:51:49 prd1 corosync[8672]:  [TOTEM ] The network
>>> interface is
>>>   >> down.
>>>   >
>>>   > ^^^ This is important line. It means corosync was unable to find
>>> interface for bindnetaddr 192.168.150.0. Make sure interface with this
>>> network address exists.
>>>   >
>>>   >
>>> this machine has two IP address assigned on interface bond0
>>>
>>> bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
>>>       link/ether 74:e6:e2:73:e5:61 brd ff:ff:ff:ff:ff:ff
>>>       inet 10.150.20.91/24 brd 10.150.20.55 scope global bond0
>>>       inet 192.168.150.12/22 brd 192.168.151.255 scope global
>>> bond0:cluster
>>>       inet6 fe80::76e6:e2ff:fe73:e561/64 scope link
>>>          valid_lft forever preferred_lft forever
>>
>> This is ifconfig output? I'm just wondering how you were able to set two
>> ipv4 addresses (in this format, I would expect another interface like
>> bond0:1 or nothing at all)?
>>
>
> That is how Linux stack works for the last 10 or 15 years. The bond0:1
> is legacy emulation for ifconfig addicts.
>
> ip addr add 10.150.20.91/24 dev bond0

Hmm.

RHEL 6:

# tunctl -p
Set 'tap0' persistent and owned by uid 0

# ip addr add 192.168.7.1/24 dev tap0
# ip addr add 192.168.8.1/24 dev tap0
# ifconfig tap0
tap0      Link encap:Ethernet  HWaddr 22:95:B1:85:67:3F
           inet addr:192.168.7.1  Bcast:0.0.0.0  Mask:255.255.255.0
           BROADCAST MULTICAST  MTU:1500  Metric:1
           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:500
           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

RHEL 7:
# ip tuntap add dev tap0 mode tap
#  ip addr add 192.168.7.1/24 dev tap0
# ip addr add 192.168.8.1/24 dev tap0
# ifconfig tap0
tap0: flags=4098<BROADCAST,MULTICAST>  mtu 1500
         inet 192.168.7.1  netmask 255.255.255.0  broadcast 0.0.0.0
         ether 36:02:5c:ff:29:ea  txqueuelen 500  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

So where do you see 192.168.8.1 in ifconfig output?

>
>> Anyway, I was trying to create bonding interface and set second ipv4
>> (via ip addr) and corosync (flatiron what is 1.4.8 + 4 for your problem
>> completely unrelated patches) was able to detect it without any problem.
>>
>> I can recommend you to try:
>> - Set bindnetaddr to IP address of given node (so you have to change
>> bindnetaddr on both nodes)
>> - Try upstream corosync 1.4.8/flatiron
>>
>> Regards,
>>    Honza
>>
>>>
>>> And I can ping 192.168.150.12 from this machine and from other machines
>>> on network
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Muhammad Sharfuddin
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>