[Pacemaker] CMAN and Pacemaker with IPv6

Wed Jul 16 08:05:31 EDT 2014

Teerapatr

> Dear Honza,
> 
> Sorry to say this, but I found new error again. LOL
> 
> This time, I already install the 1.4.1-17 as your advice.
> And the nodename, without altname, is map to IPv6 using hosts file.
> Everything is fine, but the 2 node can't communicate to each other.
> So I add the multicast address manually, using command `ccs -f
> /etc/cluster/cluster.conf --setmulticast ff::597` on both node.
> After that the CMAN cannot start.

ff:: is not valid ipv6 multicast address. Use something like ff3e::597.

> 
> Starting cluster:
>    Checking if cluster has been disabled at boot...        [  OK  ]
>    Checking Network Manager...                             [  OK  ]
>    Global setup...                                         [  OK  ]
>    Loading kernel modules...                               [  OK  ]
>    Mounting configfs...                                    [  OK  ]
>    Starting cman... Timed-out waiting for cluster Check cluster logs for details
>                                                            [FAILED]
> 
> I also found a lot of LOG, but I think that this is where the problem has occur.
> 
> Jul 15 13:36:14 corosync [MAIN  ] Corosync Cluster Engine ('1.4.1'):
> started and ready to provide service.
> Jul 15 13:36:14 corosync [MAIN  ] Corosync built-in features: nss dbus rdma snmp
> Jul 15 13:36:14 corosync [MAIN  ] Successfully read config from
> /etc/cluster/cluster.conf
> Jul 15 13:36:14 corosync [MAIN  ] Successfully parsed cman config
> Jul 15 13:36:14 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
> Jul 15 13:36:14 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Jul 15 13:36:14 corosync [TOTEM ] Unable to bind the socket to receive
> multicast packets: Cannot assign requested address (99)
> Jul 15 13:36:14 corosync [TOTEM ] Could not set traffic priority:
> Socket operation on non-socket (88)
> Jul 15 13:36:14 corosync [TOTEM ] The network interface
> [2001:db8::151] is now up.
> Jul 15 13:36:14 corosync [QUORUM] Using quorum provider quorum_cman
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync
> cluster quorum service v0.1
> Jul 15 13:36:14 corosync [CMAN  ] CMAN 3.0.12.1 (built Apr 14 2014
> 09:36:10) started
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync CMAN
> membership service 2.90
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: openais
> checkpoint service B.01.01
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync
> extended virtual synchrony service
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync
> configuration service
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync
> cluster closed process group service v1.01
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync
> cluster config database access v1.01
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync
> profile loading service
> Jul 15 13:36:14 corosync [QUORUM] Using quorum provider quorum_cman
> Jul 15 13:36:14 corosync [SERV  ] Service engine loaded: corosync
> cluster quorum service v0.1
> Jul 15 13:36:14 corosync [MAIN  ] Compatibility mode set to whitetank.
> Using V1 and V2 of the synchronization engine.
> Jul 15 13:36:17 corosync [MAIN  ] Totem is unable to form a cluster
> because of an operating system or network fault. The most common cause
> of this message is that the local firewall is configured improperly.
> Jul 15 13:36:19 corosync [MAIN  ] Totem is unable to form a cluster
> because of an operating system or network fault. The most common cause
> of this message is that the local firewall is configured improperly.
> Jul 15 13:36:20 corosync [MAIN  ] Totem is unable to form a cluster
> because of an operating system or network fault. The most common cause
> of this message is that the local firewall is configured improperly.
> 
> I cannot find the solution on Internet about "[TOTEM ] Unable to bind
> the socket to receive multicast packets: Cannot assign requested
> address (99)".
> Do you have any idea?
> 
> Teenigma
> 
> On Tue, Jul 15, 2014 at 10:02 AM, Teerapatr Kittiratanachai
> <maillist.tk at gmail.com> wrote:
>> Honza
>>
>> Great, Thank you very much.
>>
>> But the terrible thing for me is I'm using the package from OpenSUSE repo.
>> When i turn back to CentOS repo, which store lower version, the
>> Dependency problem has occurred.
>>
>> Anyway, thank you for your help.
>>
>> Teenigma
>>
>> On Mon, Jul 14, 2014 at 8:51 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>>>> Honza,
>>>>
>>>> How do I include the patch with my CentOS package?
>>>> Do I need to compile them manually?
>>>
>>>
>>> Yes. Also official CentOS version was never 1.4.5. If you are using CentOS,
>>> just use stock 1.4.1-17.1. Patch is included there.
>>>
>>> Honza
>>>
>>>
>>>>
>>>> TeEniGMa
>>>>
>>>> On Mon, Jul 14, 2014 at 3:21 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>>>>>
>>>>> Teerapatr,
>>>>>
>>>>>
>>>>>> For more information,
>>>>>>
>>>>>>
>>>>>> these are LOG from /var/log/messages
>>>>>> ...
>>>>>> Jul 14 10:28:07 wh00 kernel: : DLM (built Mar 25 2014 20:01:13)
>>>>>> installed
>>>>>> Jul 14 10:28:07 wh00 corosync[2716]:   [MAIN  ] Corosync Cluster
>>>>>> Engine ('1.4.5'): started and ready to provide service.
>>>>>> Jul 14 10:28:07 wh00 corosync[2716]:   [MAIN  ] Corosync built-in
>>>>>> features: nss
>>>>>> Jul 14 10:28:07 wh00 corosync[2716]:   [MAIN  ] Successfully read
>>>>>> config from /etc/cluster/cluster.conf
>>>>>> Jul 14 10:28:07 wh00 corosync[2716]:   [MAIN  ] Successfully parsed cman
>>>>>> config
>>>>>> Jul 14 10:28:07 wh00 corosync[2716]:   [TOTEM ] Initializing transport
>>>>>> (UDP/IP Multicast).
>>>>>> Jul 14 10:28:07 wh00 corosync[2716]:   [TOTEM ] Initializing
>>>>>> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>>>>>> Jul 14 10:28:07 wh00 corosync[2716]:   [TOTEM ] The network interface is
>>>>>> down.
>>>>>
>>>>>
>>>>> ^^^ This line is important. This means, corosync was unable to find
>>>>> interface with given IPv6 address. There was regression in v1.4.5 causing
>>>>> this behavior. It's fixed in v1.4.6 (patch is
>>>>>
>>>>> https://github.com/corosync/corosync/commit/d76759ec26ecaeb9cc01f49e9eb0749b61454d27).
>>>>> So you can ether apply patch or (recommended) upgrade to 1.4.7.
>>>>>
>>>>> Regards,
>>>>>    Honza
>>>>>
>>>>>
>>>>>
>>>>>> Jul 14 10:28:10 wh00 pacemaker: Aborting startup of Pacemaker Cluster
>>>>>> Manager
>>>>>> ...
>>>>>>
>>>>>> Te
>>>>>>
>>>>>> On Mon, Jul 14, 2014 at 10:07 AM, Teerapatr Kittiratanachai
>>>>>> <maillist.tk at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Honza,
>>>>>>>
>>>>>>> Sorry for late reply.
>>>>>>> After I have tested with all new configuration.
>>>>>>> On IPv6 only, and with no altname.
>>>>>>>
>>>>>>> I face with error below,
>>>>>>>
>>>>>>> Starting cluster:
>>>>>>>      Checking if cluster has been disabled at boot...        [  OK  ]
>>>>>>>      Checking Network Manager...                             [  OK  ]
>>>>>>>      Global setup...                                         [  OK  ]
>>>>>>>      Loading kernel modules...                               [  OK  ]
>>>>>>>      Mounting configfs...                                    [  OK  ]
>>>>>>>      Starting cman... corosync died with signal: 6 Check cluster logs
>>>>>>> for
>>>>>>> details
>>>>>>>                                                              [FAILED]
>>>>>>>
>>>>>>> And, exactly, there are no any enabled firewall, I also configure the
>>>>>>> Multicast address as manual.
>>>>>>> Could you advise me the solution?
>>>>>>>
>>>>>>> Many thanks in advance.
>>>>>>> Te
>>>>>>>
>>>>>>> On Thu, Jul 10, 2014 at 6:14 PM, Jan Friesse <jfriesse at redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Teerapatr,
>>>>>>>>
>>>>>>>>> Hi Honza,
>>>>>>>>>
>>>>>>>>> As you said I use the nodename identify by hostname (which be
>>>>>>>>> accessed
>>>>>>>>> via IPv6) and the node also has the altname (which be IPv4 address).
>>>>>>>>>
>>>>>>>>
>>>>>>>> This doesn't work. Both hostname and altname have to be same IP
>>>>>>>> version.
>>>>>>>>
>>>>>>>>> Now, I configure the mcast address for both nodename and altname
>>>>>>>>> manually. The CMAN and Pacemaker can start ad well. But they don't
>>>>>>>>> communicate to another node.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> PLease make sure (as I've wrote in previous email) your firewall
>>>>>>>> doesn't
>>>>>>>> block mcast and corosync traffic (just disable it) and switch doesn't
>>>>>>>> block multicast (this is very often the case). If these are VMs, make
>>>>>>>> sure to properly configure bridge (just disable firewall) and allow
>>>>>>>> mcast_querier.
>>>>>>>>
>>>>>>>> Honza
>>>>>>>>
>>>>>>>>> On node0, crm_mon show node1 offline. In the same way, node one show
>>>>>>>>> node0 is down. So the split brain problem occur here.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Te
>>>>>>>>>
>>>>>>>>> On Thu, Jul 10, 2014 at 2:50 PM, Jan Friesse <jfriesse at redhat.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Teerapatr,
>>>>>>>>>>
>>>>>>>>>>> OK, some problems are solved.
>>>>>>>>>>> I use the incorrect hostname.
>>>>>>>>>>>
>>>>>>>>>>> For now, the new problem has occured.
>>>>>>>>>>>
>>>>>>>>>>>     Starting cman... Node address family does not match multicast
>>>>>>>>>>> address family
>>>>>>>>>>> Unable to get the configuration
>>>>>>>>>>> Node address family does not match multicast address family
>>>>>>>>>>> cman_tool: corosync daemon didn't start Check cluster logs for
>>>>>>>>>>> details
>>>>>>>>>>>
>>>>>>>>>>> [FAILED]
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This looks like one of your node is also reachable via ipv4 and ipv4
>>>>>>>>>> resolving is proffered. Please make sure to set only ipv6 address
>>>>>>>>>> and
>>>>>>>>>> try it again. Of course set mcast addr by hand maybe helpful
>>>>>>>>>> (even-tho
>>>>>>>>>> I
>>>>>>>>>> don't believe it will solve problem you are hitting)).
>>>>>>>>>>
>>>>>>>>>> Also make sure ip6tables are properly configured and your switch is
>>>>>>>>>> able
>>>>>>>>>> to pass ipv6 mcast traffic.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>     Honza
>>>>>>>>>>
>>>>>>>>>>> How can i fix it? Or just assigned the multicast address in the
>>>>>>>>>>> configuration?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Te
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 10, 2014 at 7:52 AM, Teerapatr Kittiratanachai
>>>>>>>>>>> <maillist.tk at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I not found any LOG message
>>>>>>>>>>>>
>>>>>>>>>>>> /var/log/messages
>>>>>>>>>>>> ...
>>>>>>>>>>>> Jul 10 07:44:19 nwh00 kernel: : DLM (built Jun 19 2014 21:16:01)
>>>>>>>>>>>> installed
>>>>>>>>>>>> Jul 10 07:44:22 nwh00 pacemaker: Aborting startup of Pacemaker
>>>>>>>>>>>> Cluster Manager
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> and this is what display when I try to start pacemaker
>>>>>>>>>>>>
>>>>>>>>>>>> # /etc/init.d/pacemaker start
>>>>>>>>>>>> Starting cluster:
>>>>>>>>>>>>      Checking if cluster has been disabled at boot...        [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Checking Network Manager...                             [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Global setup...                                         [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Loading kernel modules...                               [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Mounting configfs...                                    [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Starting cman... Cannot find node name in cluster.conf
>>>>>>>>>>>> Unable to get the configuration
>>>>>>>>>>>> Cannot find node name in cluster.conf
>>>>>>>>>>>> cman_tool: corosync daemon didn't start Check cluster logs for
>>>>>>>>>>>> details
>>>>>>>>>>>>
>>>>>>>>>>>> [FAILED]
>>>>>>>>>>>> Stopping cluster:
>>>>>>>>>>>>      Leaving fence domain...                                 [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Stopping gfs_controld...                                [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Stopping dlm_controld...                                [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Stopping fenced...                                      [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Stopping cman...                                        [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Unloading kernel modules...                             [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>>      Unmounting configfs...                                  [  OK
>>>>>>>>>>>> ]
>>>>>>>>>>>> Aborting startup of Pacemaker Cluster Manager
>>>>>>>>>>>>
>>>>>>>>>>>> another one thing, according to the happened problem, I remove the
>>>>>>>>>>>> AAAA record from DNS for now and map it in to /etc/hosts files
>>>>>>>>>>>> instead, as shown below.
>>>>>>>>>>>>
>>>>>>>>>>>> /etc/hosts
>>>>>>>>>>>> ...
>>>>>>>>>>>> 2001:db8:0:1::1   node0.example.com
>>>>>>>>>>>> 2001:db8:0:1::2   node1.example.com
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any configure that help me to got more log ?
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 10, 2014 at 5:06 AM, Andrew Beekhof
>>>>>>>>>>>> <andrew at beekhof.net>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 9 Jul 2014, at 9:15 pm, Teerapatr Kittiratanachai
>>>>>>>>>>>>> <maillist.tk at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I has implemented the HA on dual stack servers,
>>>>>>>>>>>>>> Firstly, I doesn't deploy IPv6 record on DNS yet. The CMAN and
>>>>>>>>>>>>>> PACEMAKER can work as normal.
>>>>>>>>>>>>>> But, after I create AAAA record on DNS server, i found the error
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> cann't start CMAN.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are CMAN and PACEMAKER  support the IPv6?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don;t think pacemaker cares.
>>>>>>>>>>>>> What errors did you get?
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>>>
>>>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>>>> Getting started:
>>>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>>
>>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>>> Getting started:
>>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>>
>>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>>> Getting started:
>>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>