[Pacemaker] [corosync] CoroSync's UDPu transport for public IP addresses?

Jan Friesse jfriesse at redhat.com
Mon Jan 19 02:55:15 EST 2015


Dmitry,


> Great, it works! Thank you.
>
> It would be extremely helpful if this information will be included in a
> default corosync.conf as comments:
> - regarding allowed and even preferred absense of totem.interface in case
> of UDPu

Yep

> - that quorum section must not be empty, and that the default quorum.provider
> could be corosync_votequorum (but not empty).

This is not entirely true. quorum.provider cannot be empty string, or 
generally must be valid provider like corosync_votequorum. But 
unspecified quorum.provider works without any problem (as in example 
configuration file). Truth is, that Pacemaker must then be configured in 
a way that quorum is not required.

Regards,
   Honza

>
> It would help to install and launch corosync instantly by novices.
>
>
> On Fri, Jan 16, 2015 at 7:31 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>
>> Dmitry Koterov napsal(a):
>>
>>>
>>>>   such messages (for now). But, anyway, DNS names in ringX_addr seem not
>>>>> working, and no relevant messages are in default logs. Maybe add some
>>>>> validations for ringX_addr?
>>>>>
>>>>> I'm having resolvable DNS names:
>>>>>
>>>>> root at node1:/etc/corosync# ping -c1 -W100 node1 | grep from
>>>>> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>>>>>
>>>>>
>>>> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
>>>> wrong. Names you want to use in corosync.conf should resolve to
>>>> interface address. I believe other nodes has similar setting (so node2
>>>> resolved on node2 is again 127.0.0.1)
>>>>
>>>>
>>> Wow! What a shame! How could I miss it... So you're absolutely right,
>>> thanks: that was the cause, an entry in /etc/hosts. On some machines I
>>> removed it manually, but on others - didn't. Now I do it automatically
>>> by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
>>> initialization script.
>>>
>>> I apologize for the mess.
>>>
>>> So now I have only one place in corosync.conf where I need to specify a
>>> plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
>>> 0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
>>> failed to load for reason 'configuration error: nodelist or
>>> quorum.expected_votes must be configured!'" in the logs (BTW it does not
>>> say that I mistaked in bindnetaddr). Is there a way to completely untie
>>> from IP addresses?
>>>
>>
>> You can just remove whole interface section completely. Corosync will find
>> correct address from nodelist.
>>
>> Regards,
>>    Honza
>>
>>
>>
>>>
>>>
>>>   Please try to fix this problem first and let's see if this will solve
>>>> issue you are hitting.
>>>>
>>>> Regards,
>>>>     Honza
>>>>
>>>>   root at node1:/etc/corosync# ping -c1 -W100 node2 | grep from
>>>>> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>>>>>
>>>>> root at node1:/etc/corosync# ping -c1 -W100 node3 | grep from
>>>>> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>>>>>
>>>>>
>>>>> With corosync.conf below, nothing works:
>>>>> ...
>>>>> nodelist {
>>>>>     node {
>>>>>       ring0_addr: node1
>>>>>     }
>>>>>     node {
>>>>>       ring0_addr: node2
>>>>>     }
>>>>>     node {
>>>>>       ring0_addr: node3
>>>>>     }
>>>>> }
>>>>> ...
>>>>> Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync Cluster Engine
>>>>> ('2.3.3'): started and ready to provide service.
>>>>> Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync built-in
>>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing transport
>>>>> (UDP/IP Unicast).
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing
>>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] The network interface
>>>>> [a.b.c.d] is now up.
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>>> corosync configuration map access [0]
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cmap
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>>> corosync configuration service [1]
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cfg
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>>> corosync cluster closed process group service v1.01 [2]
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cpg
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>>> corosync profile loading service [4]
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] No Watchdog, try
>>>>>
>>>> modprobe
>>>>
>>>>> <a watchdog>
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] no resources
>>>>> configured.
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>>>> corosync watchdog service [7]
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Using quorum provider
>>>>> corosync_votequorum
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Quorum provider:
>>>>> corosync_votequorum failed to initialize.
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine
>>>>> 'corosync_quorum' failed to load for reason 'configuration error:
>>>>>
>>>> nodelist
>>>>
>>>>> or quorum.expected_votes must be configured!'
>>>>> Jan 14 10:47:44 node1 corosync[15062]:  [MAIN  ] Corosync Cluster Engine
>>>>> exiting with status 20 at service.c:356.
>>>>>
>>>>>
>>>>> But with IP addresses specified in ringX_addr, everything works:
>>>>> ...
>>>>> nodelist {
>>>>>     node {
>>>>>       ring0_addr: 104.236.71.79
>>>>>     }
>>>>>     node {
>>>>>       ring0_addr: 188.166.54.190
>>>>>     }
>>>>>     node {
>>>>>       ring0_addr: 128.199.116.218
>>>>>     }
>>>>> }
>>>>> ...
>>>>> Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync Cluster Engine
>>>>> ('2.3.3'): started and ready to provide service.
>>>>> Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync built-in
>>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing transport
>>>>> (UDP/IP Unicast).
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing
>>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] The network interface
>>>>> [a.b.c.d] is now up.
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>>> corosync configuration map access [0]
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cmap
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>>> corosync configuration service [1]
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cfg
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>>> corosync cluster closed process group service v1.01 [2]
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cpg
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>>> corosync profile loading service [4]
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] No Watchdog, try
>>>>>
>>>> modprobe
>>>>
>>>>> <a watchdog>
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] no resources
>>>>> configured.
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>>> corosync watchdog service [7]
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Using quorum provider
>>>>> corosync_votequorum
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>>> corosync vote quorum service v1.0 [5]
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: votequorum
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>>>> corosync cluster quorum service v0.1 [3]
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: quorum
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>>>> {a.b.c.d}
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>>>> {e.f.g.h}
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>>>> {i.j.k.l}
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] A new membership
>>>>> (m.n.o.p:80) was formed. Members joined: 1760315215
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Members[1]: 1760315215
>>>>> Jan 14 10:48:28 node1 corosync[15156]:  [MAIN  ] Completed service
>>>>> synchronization, ready to provide service.
>>>>>
>>>>>
>>>>> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse at redhat.com>
>>>>> wrote:
>>>>>
>>>>>   Dmitry,
>>>>>>
>>>>>>
>>>>>>   Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS
>>>>>>> names
>>>>>>> are definitely resolved), but in practice the cluster does not work,
>>>>>>>
>>>>>> as I
>>>>
>>>>> said above. So validations of ringX_addr in corosync.conf would be very
>>>>>>> helpful in corosync.
>>>>>>>
>>>>>>
>>>>>> that's weird. Because as long as DNS is resolved, corosync works only
>>>>>> with IP. This means, code path is exactly same with IP or with DNS. Do
>>>>>> you have logs from corosync?
>>>>>>
>>>>>> Honza
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse at redhat.com>
>>>>>>>
>>>>>> wrote:
>>>>
>>>>>
>>>>>>>   Dmitry,
>>>>>>>>
>>>>>>>>
>>>>>>>>    No, I meant that if you pass a domain name in ring0_addr, there are
>>>>>>>>
>>>>>>> no
>>>>
>>>>> errors in logs, corosync even seems to find nodes (based on its
>>>>>>>>>
>>>>>>>> logs),
>>>>
>>>>> And
>>>>>>
>>>>>>> crm_node -l shows them, but in practice nothing really works. A
>>>>>>>>>
>>>>>>>> verbose
>>>>
>>>>> error message would be very helpful in such case.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> This sounds weird. Are you sure that DNS names really maps to correct
>>>>>>>>
>>>>>>> IP
>>>>
>>>>> address? In logs there should be something like "adding new UDPU
>>>>>>>>
>>>>>>> member
>>>>
>>>>> {IP_ADDRESS}".
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>     Honza
>>>>>>>>
>>>>>>>>
>>>>>>>>   On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>>>>>> daniel.dehennin at baby-gnu.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>    Dmitry Koterov <dmitry.koterov at gmail.com <javascript:;>> writes:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    Oh, seems I've found the solution! At least two mistakes was in
>>>>>>>>>> my
>>>>>>>>>>
>>>>>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>>>>>>>>>>>
>>>>>>>>>> conclusion
>>>>>>
>>>>>>> is
>>>>>>>>>>> based on my experiments only).
>>>>>>>>>>>
>>>>>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames!
>>>>>>>>>>> They
>>>>>>>>>>>
>>>>>>>>>>>   simply
>>>>>>>>>>
>>>>>>>>>>   do not work, "crm status" shows no nodes. And no warnings are in
>>>>>>>>>>>
>>>>>>>>>> logs
>>>>
>>>>> regarding this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> You can add name like this:
>>>>>>>>>>
>>>>>>>>>>        nodelist {
>>>>>>>>>>          node {
>>>>>>>>>>            ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>>>>>>            name: node1
>>>>>>>>>>          }
>>>>>>>>>>          node {
>>>>>>>>>>            ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>>>>>>            name: node2
>>>>>>>>>>          }
>>>>>>>>>>        }
>>>>>>>>>>
>>>>>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>>>>>
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Daniel Dehennin
>>>>>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>>>
>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>>
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>>
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>>
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/
>>>>> doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> discuss at corosync.org
>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>
>>>>
>>>
>>
>





More information about the Pacemaker mailing list