[Pacemaker] getnameinfo() vs uname()
Andrew Beekhof
andrew at beekhof.net
Fri Aug 31 18:17:47 EDT 2012
On Fri, Aug 31, 2012 at 5:55 PM, Vladislav Bogdanov
<bubble at hoster-ok.com> wrote:
> 31.08.2012 05:43, Andrew Beekhof wrote:
>> On Wed, Aug 29, 2012 at 8:57 PM, Vladislav Bogdanov
>> <bubble at hoster-ok.com> wrote:
>>> 29.08.2012 13:33, Andrew Beekhof wrote:
>>>> On Wed, Aug 29, 2012 at 4:22 PM, Vladislav Bogdanov
>>>> <bubble at hoster-ok.com> wrote:
>>>>> Hi,
>>>>>
>>>>> It looks like pacemaker (current master)
>>>>
>>>> "current master" changes quite rapidly, could you be specific?
>>>
>>> c72f5ca
>>>
>>>>
>>>>> does not always work nicely on
>>>>> top of corosync2 if one doesn't have /etc/hosts with all cluster nodes
>>>>> in it, where short form of name goes before the long one (so
>>>>> gethostbyaddr() and getnameinfo() return the short one).
>>>>
>>>> I noticed a different issue related to this, but I need to know
>>>> exactly which version you had before I can answer properly.
>>
>> Ok...
>>
>> Pacemaker doesn't actually care about FQDN vs short names.
>> Short names are arguably nicer to look at, but the only thing that
>> really matters is that when node A looks up its own name, that the
>> answer is consistent with the answer /other/ nodes get when they look
>> up node A.
>>
>> The problem to date, is that local lookups have used uname(3P) while
>> remote lookups are using some other method (like getnameinfo(3)) .
>> So I think the first step to fixing this mess is to have everyone
>> using the same mechanism - for corosync 2.x clusters[1] that will
>> almost certainly be the corosync_node_name() function you spotted.
>>
>> If no nodelist[2] is specified in corosync.conf, we use getnameinfo()
>> on the address corosync is bound to - possibly with your amendment
>> below.
>> If there is a node list, we will look for a name in the 'ring0_addr'
>> or 'name' fields
>> If those fields are missing or contain IP addresses, we fall back to
>> getnameinfo() as per the "no nodelist" case.
>> If non of those work, I guess we fall back to uname() and hope for the best.
>
> That is sane, thank you for explanation.
>
>>
>>
>> I'm going to make this the first thing I do after 1.1.8 comes out
>> (we're waiting on http://bugs.clusterlabs.org/show_bug.cgi?id=5044 and
>> some final CTS runs).
>
> Btw, to 1.1.8, I spotted two paths (in c72f5ca) where stonithd dumps
> core, one sigsegv when doing manual ack and one assert when queuing
> remote operation (may be vise versa, can't look right now).
> It this under control of CTS?
No. Do please report and segfaults you see ASAP.
>
>
> Vladislav
>
>> If someone wants to help out before then, I would certainly not complain :)
>>
>> -- Andrew
>>
>> [1] We will implement equivalent functions for the other cluster types.
>> [2] The nodelist section looks something like:
>> nodelist {
>> node {
>> nodeid: 1
>> ring0_addr: pcmk-1
>> quorum_votes: 1
>> }
>> node {
>> nodeid: 2
>> ring0_addr: pcmk-2
>> quorum_votes: 2
>> }
>> }
>>
>>
>>
>>>>
>>>>> I tried to run
>>>>> test cluster with stub /etc/hosts but fully functional name server, and
>>>>> I see that pacemaker includes long nodenames (fqdn) into nodelist, while
>>>>> expecting them to be equal to what uname() returns for the local node.
>>>>> After I created needed entries in /etc/hosts everything began to work.
>>>>> From getaddrinfo manpage, NI_NOFQDN flag should help to avoid this
>>>>> behavior.
>>>
>>> s/getaddrinfo/getnameinfo/
>>>
>>> Actually it doesn't. At least not always.
>>> Problem is that hostname (nodename) may be either fqdn (like anaconda
>>> tries to set) or contain only host part. And getnameinfo() is not
>>> consistent here (as in EL6), it strips domainname of a local system with
>>> leading dot if local hostname is FQDN, but returns FQDN which
>>> corresponds to address being searched if hostname is host-only.
>>>
>>> So, I tried following patch and it works perfectly for me (hosnames are
>>> host-only, and DNS is correctly configured, so hostname -f returns FQDN).
>>>
>>> diff -urNp a/lib/cluster/corosync.c b/lib/cluster/corosync.c
>>> --- a/lib/cluster/corosync.c 2012-08-29 07:32:57.000000000 +0000
>>> +++ b/lib/cluster/corosync.c 2012-08-29 07:33:54.730099738 +0000
>>> @@ -207,7 +207,15 @@ static char *corosync_node_name(cmap_han
>>> addrlen = sizeof(struct sockaddr_in);
>>> }
>>>
>>> - if (getnameinfo((struct sockaddr *)addrs[0].address,
>>> addrlen, buf, sizeof(buf), NULL, 0, 0) == 0) {
>>> + if (getnameinfo((struct sockaddr *)addrs[0].address,
>>> addrlen, buf, sizeof(buf), NULL, 0, NI_NAMEREQD) == 0) {
>>> + char *p = buf;
>>> + while (*p) {
>>> + if (*p == '.') {
>>> + *p = '\0';
>>> + break;
>>> + }
>>> + p++;
>>> + }
>>> crm_notice("Inferred node name '%s' for nodeid %u from
>>> DNS", buf, nodeid);
>>>
>>> if(corosync_name_is_valid("DNS", buf)) {
>>>
>>>
>>> Now I do not see FQDNs in nodelist.
>>> Grrr, line wrapping...
>>>
>>>>> Additionally, NI_NAMEREQD flag should probably be also used.
>>>
>>> This one still applies. Otherwise getnameinfo can return string
>>> representation of IP address if it cannot resolve it.
>>
>> Thats not a big deal, corosync_name_is_valid() will detect this and
>> refuse to use it.
>>
>>>
>>> Btw, NI_MAXHOST should be used instead of INET6_ADDRSTRLEN for buf there.
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Pacemaker
mailing list