[Pacemaker] getnameinfo() vs uname()

Vladislav Bogdanov bubble at hoster-ok.com
Fri Aug 31 03:55:04 EDT 2012


31.08.2012 05:43, Andrew Beekhof wrote:
> On Wed, Aug 29, 2012 at 8:57 PM, Vladislav Bogdanov
> <bubble at hoster-ok.com> wrote:
>> 29.08.2012 13:33, Andrew Beekhof wrote:
>>> On Wed, Aug 29, 2012 at 4:22 PM, Vladislav Bogdanov
>>> <bubble at hoster-ok.com> wrote:
>>>> Hi,
>>>>
>>>> It looks like pacemaker (current master)
>>>
>>> "current master" changes quite rapidly, could you be specific?
>>
>> c72f5ca
>>
>>>
>>>> does not always work nicely on
>>>> top of corosync2 if one doesn't have /etc/hosts with all cluster nodes
>>>> in it, where short form of name goes before the long one (so
>>>> gethostbyaddr() and getnameinfo() return the short one).
>>>
>>> I noticed a different issue related to this, but I need to know
>>> exactly which version you had before I can answer properly.
> 
> Ok...
> 
> Pacemaker doesn't actually care about FQDN vs short names.
> Short names are arguably nicer to look at, but the only thing that
> really matters is that when node A looks up its own name, that the
> answer is consistent with the answer /other/ nodes get when they look
> up node A.
> 
> The problem to date, is that local lookups have used uname(3P) while
> remote lookups are using some other method (like getnameinfo(3)) .
> So I think the first step to fixing this mess is to have everyone
> using the same mechanism - for corosync 2.x clusters[1] that will
> almost certainly be the corosync_node_name() function you spotted.
> 
> If no nodelist[2] is specified in corosync.conf, we use getnameinfo()
> on the address corosync is bound to - possibly with your amendment
> below.
> If there is a node list, we will look for a name in the 'ring0_addr'
> or 'name' fields
> If those fields are missing or contain IP addresses, we fall back to
> getnameinfo() as per the "no nodelist" case.
> If non of those work, I guess we fall back to uname() and hope for the best.

That is sane, thank you for explanation.

> 
> 
> I'm going to make this the first thing I do after 1.1.8 comes out
> (we're waiting on http://bugs.clusterlabs.org/show_bug.cgi?id=5044 and
> some final CTS runs).

Btw, to 1.1.8, I spotted two paths (in c72f5ca) where stonithd dumps
core, one sigsegv when doing manual ack and one assert when queuing
remote operation (may be vise versa, can't look right now).
It this under control of CTS?


Vladislav

> If someone wants to help out before then, I would certainly not complain :)
> 
> -- Andrew
> 
> [1] We will implement equivalent functions for the other cluster types.
> [2] The nodelist section looks something like:
> nodelist {
>     node {
>         nodeid: 1
>         ring0_addr: pcmk-1
>         quorum_votes: 1
>     }
>     node {
>         nodeid: 2
>         ring0_addr: pcmk-2
>         quorum_votes: 2
>     }
> }
> 
> 
> 
>>>
>>>> I tried to run
>>>> test cluster with stub /etc/hosts but fully functional name server, and
>>>> I see that pacemaker includes long nodenames (fqdn) into nodelist, while
>>>> expecting them to be equal to what uname() returns for the local node.
>>>> After I created needed entries in /etc/hosts everything began to work.
>>>> From getaddrinfo manpage, NI_NOFQDN flag should help to avoid this
>>>> behavior.
>>
>> s/getaddrinfo/getnameinfo/
>>
>> Actually it doesn't. At least not always.
>> Problem is that hostname (nodename) may be either fqdn (like anaconda
>> tries to set) or contain only host part. And getnameinfo() is not
>> consistent here (as in EL6), it strips domainname of a local system with
>> leading dot if local hostname is FQDN, but returns FQDN which
>> corresponds to address being searched if hostname is host-only.
>>
>> So, I tried following patch and it works perfectly for me (hosnames are
>> host-only, and DNS is correctly configured, so hostname -f returns FQDN).
>>
>> diff -urNp a/lib/cluster/corosync.c b/lib/cluster/corosync.c
>> --- a/lib/cluster/corosync.c    2012-08-29 07:32:57.000000000 +0000
>> +++ b/lib/cluster/corosync.c    2012-08-29 07:33:54.730099738 +0000
>> @@ -207,7 +207,15 @@ static char *corosync_node_name(cmap_han
>>                  addrlen = sizeof(struct sockaddr_in);
>>              }
>>
>> -            if (getnameinfo((struct sockaddr *)addrs[0].address,
>> addrlen, buf, sizeof(buf), NULL, 0, 0) == 0) {
>> +            if (getnameinfo((struct sockaddr *)addrs[0].address,
>> addrlen, buf, sizeof(buf), NULL, 0, NI_NAMEREQD) == 0) {
>> +                char *p = buf;
>> +                while (*p) {
>> +                    if (*p == '.') {
>> +                        *p = '\0';
>> +                        break;
>> +                    }
>> +                    p++;
>> +                }
>>                  crm_notice("Inferred node name '%s' for nodeid %u from
>> DNS", buf, nodeid);
>>
>>                  if(corosync_name_is_valid("DNS", buf)) {
>>
>>
>> Now I do not see FQDNs in nodelist.
>> Grrr, line wrapping...
>>
>>>> Additionally, NI_NAMEREQD flag should probably be also used.
>>
>> This one still applies. Otherwise getnameinfo can return string
>> representation of IP address if it cannot resolve it.
> 
> Thats not a big deal, corosync_name_is_valid() will detect this and
> refuse to use it.
> 
>>
>> Btw, NI_MAXHOST should be used instead of INET6_ADDRSTRLEN for buf there.
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 





More information about the Pacemaker mailing list