[Pacemaker] [corosync] CoroSync's UDPu transport for public IP addresses?

Jan Friesse jfriesse at redhat.com
Fri Jan 16 16:31:21 UTC 2015


Dmitry Koterov napsal(a):
>>
>>> such messages (for now). But, anyway, DNS names in ringX_addr seem not
>>> working, and no relevant messages are in default logs. Maybe add some
>>> validations for ringX_addr?
>>>
>>> I'm having resolvable DNS names:
>>>
>>> root at node1:/etc/corosync# ping -c1 -W100 node1 | grep from
>>> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>>>
>>
>> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
>> wrong. Names you want to use in corosync.conf should resolve to
>> interface address. I believe other nodes has similar setting (so node2
>> resolved on node2 is again 127.0.0.1)
>>
>
> Wow! What a shame! How could I miss it... So you're absolutely right,
> thanks: that was the cause, an entry in /etc/hosts. On some machines I
> removed it manually, but on others - didn't. Now I do it automatically
> by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
> initialization script.
>
> I apologize for the mess.
>
> So now I have only one place in corosync.conf where I need to specify a
> plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
> 0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
> failed to load for reason 'configuration error: nodelist or
> quorum.expected_votes must be configured!'" in the logs (BTW it does not
> say that I mistaked in bindnetaddr). Is there a way to completely untie
> from IP addresses?

You can just remove whole interface section completely. Corosync will 
find correct address from nodelist.

Regards,
   Honza

>
>
>
>> Please try to fix this problem first and let's see if this will solve
>> issue you are hitting.
>>
>> Regards,
>>    Honza
>>
>>> root at node1:/etc/corosync# ping -c1 -W100 node2 | grep from
>>> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>>>
>>> root at node1:/etc/corosync# ping -c1 -W100 node3 | grep from
>>> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>>>
>>>
>>> With corosync.conf below, nothing works:
>>> ...
>>> nodelist {
>>>    node {
>>>      ring0_addr: node1
>>>    }
>>>    node {
>>>      ring0_addr: node2
>>>    }
>>>    node {
>>>      ring0_addr: node3
>>>    }
>>> }
>>> ...
>>> Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync Cluster Engine
>>> ('2.3.3'): started and ready to provide service.
>>> Jan 14 10:47:44 node1 corosync[15061]:  [MAIN  ] Corosync built-in
>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing transport
>>> (UDP/IP Unicast).
>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] Initializing
>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>> Jan 14 10:47:44 node1 corosync[15062]:  [TOTEM ] The network interface
>>> [a.b.c.d] is now up.
>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>> corosync configuration map access [0]
>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cmap
>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>> corosync configuration service [1]
>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cfg
>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>> corosync cluster closed process group service v1.01 [2]
>>> Jan 14 10:47:44 node1 corosync[15062]:  [QB    ] server name: cpg
>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>> corosync profile loading service [4]
>>> Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] No Watchdog, try
>> modprobe
>>> <a watchdog>
>>> Jan 14 10:47:44 node1 corosync[15062]:  [WD    ] no resources configured.
>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine loaded:
>>> corosync watchdog service [7]
>>> Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Using quorum provider
>>> corosync_votequorum
>>> Jan 14 10:47:44 node1 corosync[15062]:  [QUORUM] Quorum provider:
>>> corosync_votequorum failed to initialize.
>>> Jan 14 10:47:44 node1 corosync[15062]:  [SERV  ] Service engine
>>> 'corosync_quorum' failed to load for reason 'configuration error:
>> nodelist
>>> or quorum.expected_votes must be configured!'
>>> Jan 14 10:47:44 node1 corosync[15062]:  [MAIN  ] Corosync Cluster Engine
>>> exiting with status 20 at service.c:356.
>>>
>>>
>>> But with IP addresses specified in ringX_addr, everything works:
>>> ...
>>> nodelist {
>>>    node {
>>>      ring0_addr: 104.236.71.79
>>>    }
>>>    node {
>>>      ring0_addr: 188.166.54.190
>>>    }
>>>    node {
>>>      ring0_addr: 128.199.116.218
>>>    }
>>> }
>>> ...
>>> Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync Cluster Engine
>>> ('2.3.3'): started and ready to provide service.
>>> Jan 14 10:48:28 node1 corosync[15155]:  [MAIN  ] Corosync built-in
>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing transport
>>> (UDP/IP Unicast).
>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] Initializing
>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] The network interface
>>> [a.b.c.d] is now up.
>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>> corosync configuration map access [0]
>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cmap
>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>> corosync configuration service [1]
>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cfg
>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>> corosync cluster closed process group service v1.01 [2]
>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: cpg
>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>> corosync profile loading service [4]
>>> Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] No Watchdog, try
>> modprobe
>>> <a watchdog>
>>> Jan 14 10:48:28 node1 corosync[15156]:  [WD    ] no resources configured.
>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>> corosync watchdog service [7]
>>> Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Using quorum provider
>>> corosync_votequorum
>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>> corosync vote quorum service v1.0 [5]
>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: votequorum
>>> Jan 14 10:48:28 node1 corosync[15156]:  [SERV  ] Service engine loaded:
>>> corosync cluster quorum service v0.1 [3]
>>> Jan 14 10:48:28 node1 corosync[15156]:  [QB    ] server name: quorum
>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>> {a.b.c.d}
>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>> {e.f.g.h}
>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] adding new UDPU member
>>> {i.j.k.l}
>>> Jan 14 10:48:28 node1 corosync[15156]:  [TOTEM ] A new membership
>>> (m.n.o.p:80) was formed. Members joined: 1760315215
>>> Jan 14 10:48:28 node1 corosync[15156]:  [QUORUM] Members[1]: 1760315215
>>> Jan 14 10:48:28 node1 corosync[15156]:  [MAIN  ] Completed service
>>> synchronization, ready to provide service.
>>>
>>>
>>> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse at redhat.com> wrote:
>>>
>>>> Dmitry,
>>>>
>>>>
>>>>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
>>>>> are definitely resolved), but in practice the cluster does not work,
>> as I
>>>>> said above. So validations of ringX_addr in corosync.conf would be very
>>>>> helpful in corosync.
>>>>
>>>> that's weird. Because as long as DNS is resolved, corosync works only
>>>> with IP. This means, code path is exactly same with IP or with DNS. Do
>>>> you have logs from corosync?
>>>>
>>>> Honza
>>>>
>>>>
>>>>>
>>>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse at redhat.com>
>> wrote:
>>>>>
>>>>>> Dmitry,
>>>>>>
>>>>>>
>>>>>>   No, I meant that if you pass a domain name in ring0_addr, there are
>> no
>>>>>>> errors in logs, corosync even seems to find nodes (based on its
>> logs),
>>>> And
>>>>>>> crm_node -l shows them, but in practice nothing really works. A
>> verbose
>>>>>>> error message would be very helpful in such case.
>>>>>>>
>>>>>>
>>>>>> This sounds weird. Are you sure that DNS names really maps to correct
>> IP
>>>>>> address? In logs there should be something like "adding new UDPU
>> member
>>>>>> {IP_ADDRESS}".
>>>>>>
>>>>>> Regards,
>>>>>>    Honza
>>>>>>
>>>>>>
>>>>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>>>> daniel.dehennin at baby-gnu.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>   Dmitry Koterov <dmitry.koterov at gmail.com <javascript:;>> writes:
>>>>>>>>
>>>>>>>>   Oh, seems I've found the solution! At least two mistakes was in my
>>>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>>>> conclusion
>>>>>>>>> is
>>>>>>>>> based on my experiments only).
>>>>>>>>>
>>>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
>>>>>>>>>
>>>>>>>> simply
>>>>>>>>
>>>>>>>>> do not work, "crm status" shows no nodes. And no warnings are in
>> logs
>>>>>>>>> regarding this.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You can add name like this:
>>>>>>>>
>>>>>>>>       nodelist {
>>>>>>>>         node {
>>>>>>>>           ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>>>>           name: node1
>>>>>>>>         }
>>>>>>>>         node {
>>>>>>>>           ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>>>>           name: node2
>>>>>>>>         }
>>>>>>>>       }
>>>>>>>>
>>>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Daniel Dehennin
>>>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at corosync.org
>> http://lists.corosync.org/mailman/listinfo/discuss
>>
>





More information about the Pacemaker mailing list