[ClusterLabs] Antw: Heads up for ldirectord in SLES12 SP5 "Use of uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord line 1830"
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Aug 9 04:49:29 EDT 2022
Hi!
Digging further in ldirectord, I found that the utility functions do no make a difference between a name that is not known, and a name that is (probably) known, but cannot be resolved at the moment.
I hacked the corresponding functions to observe and return the error code (errno) as a negative number.
Short demo:
DB<2> x ld_gethostbyname('x',AF_INET)
0 '-2' # ENOENT
DB<3> x ld_gethostbyname('localhost',AF_INET)
0 '127.0.0.1'
DB<4> x ld_gethostbyname('localhost',AF_INET6)
0 '[::1]'
### hacked /etc/resolv.conf to make namservers unreachable (add IPs that are no nameservers or don't exist), but host exists
DB<7> x ld_gethostbyname('mail-1',AF_INET)
0 '-3' # ESRCH
Returning the error message string is a bit trickier, so I just used the error code.
However it's not clear what to do when the resolver fails (i.e.: name would be known if resolver worked). In any case it takes quite a while until an error result is returned.
For example (using the hacked functions):
if (($fallback->{port} =
&ld_getservbyname($fallback->{port}, $protocol)) =~ /^-/) {
&config_error($line, "invalid port for fallback server");
}
One could check for "== '-2'" instead, but still in the other case there is no valid port value.
Ideas?
Regards,
Ulrich
>>> Ulrich Windl schrieb am 08.08.2022 um 11:19 in Nachricht <62F0D518.3F8 : 161 :
60728>:
> Hi!
>
> The bug is still under investigation, but digging in the ldirectord code I
> found this part called when stopping:
>
> } elsif ($CMD eq "stop") {
> kill 15, $oldpid;
> ld_exit(0, "Exiting from ldirectord $CMD");
>
> As ldirectord uses a SIGTERM handler that sets a flag only and then (at some
> later time) the termination code will be started.
> Doesn't that mean the cluster will see a bad exit code (success while parts
> of ldirectord are still running)?
>
> Regards,
> Ulrich
>
>
>
> >>> Ulrich Windl schrieb am 03.08.2022 um 11:13 in Nachricht <62EA3C2C.E8D : 161
> :
> 60728>:
> > Hi!
> >
> > I wanted to inform you of an unpleasant bug in ldirectord of SLES12 SP5:
> > We had a short network problem while some redundancy paths reconfigured in
> > the infrastructure, effectively causing that some network services could
> not
> > be reached.
> > Unfortunately ldirectord controlled by the cluster reported a failure (the
> > director, not the services being directed to):
> >
> > h11 crmd[28930]: notice: h11-prm_lvs_mail_monitor_300000:369 [ Use of
> > uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord
>
> > line 1830, <CFGFILE> line 21. Error [33159] reading file
> > /etc/ldirectord/mail.conf at line 10: invalid address for virtual service\n
> ]
> > h11 ldirectord[33266]: Exiting with exit_status 2: config_error:
> > Configuration Error
> >
> > You can guess wat happened:
> > Pacemaker tried to recover (stop, then start), but the stop failed, too:
> > h11 lrmd[28927]: notice: prm_lvs_mail_stop_0:35047:stderr [ Use of
> > uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord
>
> > line 1830, <CFGFILE> line 21. ]
> > h11 lrmd[28927]: notice: prm_lvs_mail_stop_0:35047:stderr [ Error [36293]
>
> > reading file /etc/ldirectord/mail.conf at line 10: invalid address for
> > virtual service ]
> > h11 crmd[28930]: notice: Result of stop operation for prm_lvs_mail on
> h11:
> > 1 (unknown error)
> >
> > A stop failure meant that the node was fenced, interrupting all the other
> > services.
> >
> > Examining the logs I also found this interesting type of error:
> > h11 attrd[28928]: notice: Cannot update
> > fail-count-prm_lvs_rksapds5#monitor_300000[monitor]=(null) because peer
> UUID
> > not known (will retry if learned)
> >
> > Eventually, here's the code that caused the error:
> >
> > sub _ld_read_config_virtual_resolve
> > {
> > my($line, $vsrv, $ip_port, $af)=(@_);
> >
> > if($ip_port){
> > $ip_port=&ld_gethostservbyname($ip_port, $vsrv->{protocol},
> > $af);
> > if ($ip_port =~ /(\[[0-9A-Fa-f:]+\]):(\d+)/) {
> > $vsrv->{server} = $1;
> > $vsrv->{port} = $2;
> > } elsif($ip_port){
> > ($vsrv->{server}, $vsrv->{port}) = split /:/,
> > $ip_port;
> > }
> > else {
> > &config_error($line,
> > "invalid address for virtual service");
> > }
> > ...
> >
> > The value returned by ld_gethostservbyname is undefined. I also wonder what
>
> > the program logic is:
> > If the host looks like an hex address in square brackets, host and port are
>
> > split at the colon; otherwise host and port are split at the colon.
> > Why not split simply at the last colon if the value is defined, AND THEN
> > check if the components look OK?
> >
> > So the "invalid address for virtual service" is only invalid when the
> > resolver service (e.g. via LDAP) is unavailable.
> > I used host and service names for readability.
> >
> > (I reported the issue to SLES support)
> >
> > Regards,
> > Ulrich
> >
> >
> >
>
>
>
>
More information about the Users
mailing list