[ClusterLabs] Antw: Heads up for ldirectord in SLES12 SP5 "Use of uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord line 1830"
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 8 05:19:20 EDT 2022
Hi!
The bug is still under investigation, but digging in the ldirectord code I found this part called when stopping:
} elsif ($CMD eq "stop") {
kill 15, $oldpid;
ld_exit(0, "Exiting from ldirectord $CMD");
As ldirectord uses a SIGTERM handler that sets a flag only and then (at some later time) the termination code will be started.
Doesn't that mean the cluster will see a bad exit code (success while parts of ldirectord are still running)?
Regards,
Ulrich
>>> Ulrich Windl schrieb am 03.08.2022 um 11:13 in Nachricht <62EA3C2C.E8D : 161 :
60728>:
> Hi!
>
> I wanted to inform you of an unpleasant bug in ldirectord of SLES12 SP5:
> We had a short network problem while some redundancy paths reconfigured in
> the infrastructure, effectively causing that some network services could not
> be reached.
> Unfortunately ldirectord controlled by the cluster reported a failure (the
> director, not the services being directed to):
>
> h11 crmd[28930]: notice: h11-prm_lvs_mail_monitor_300000:369 [ Use of
> uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord
> line 1830, <CFGFILE> line 21. Error [33159] reading file
> /etc/ldirectord/mail.conf at line 10: invalid address for virtual service\n ]
> h11 ldirectord[33266]: Exiting with exit_status 2: config_error:
> Configuration Error
>
> You can guess wat happened:
> Pacemaker tried to recover (stop, then start), but the stop failed, too:
> h11 lrmd[28927]: notice: prm_lvs_mail_stop_0:35047:stderr [ Use of
> uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord
> line 1830, <CFGFILE> line 21. ]
> h11 lrmd[28927]: notice: prm_lvs_mail_stop_0:35047:stderr [ Error [36293]
> reading file /etc/ldirectord/mail.conf at line 10: invalid address for
> virtual service ]
> h11 crmd[28930]: notice: Result of stop operation for prm_lvs_mail on h11:
> 1 (unknown error)
>
> A stop failure meant that the node was fenced, interrupting all the other
> services.
>
> Examining the logs I also found this interesting type of error:
> h11 attrd[28928]: notice: Cannot update
> fail-count-prm_lvs_rksapds5#monitor_300000[monitor]=(null) because peer UUID
> not known (will retry if learned)
>
> Eventually, here's the code that caused the error:
>
> sub _ld_read_config_virtual_resolve
> {
> my($line, $vsrv, $ip_port, $af)=(@_);
>
> if($ip_port){
> $ip_port=&ld_gethostservbyname($ip_port, $vsrv->{protocol},
> $af);
> if ($ip_port =~ /(\[[0-9A-Fa-f:]+\]):(\d+)/) {
> $vsrv->{server} = $1;
> $vsrv->{port} = $2;
> } elsif($ip_port){
> ($vsrv->{server}, $vsrv->{port}) = split /:/,
> $ip_port;
> }
> else {
> &config_error($line,
> "invalid address for virtual service");
> }
> ...
>
> The value returned by ld_gethostservbyname is undefined. I also wonder what
> the program logic is:
> If the host looks like an hex address in square brackets, host and port are
> split at the colon; otherwise host and port are split at the colon.
> Why not split simply at the last colon if the value is defined, AND THEN
> check if the components look OK?
>
> So the "invalid address for virtual service" is only invalid when the
> resolver service (e.g. via LDAP) is unavailable.
> I used host and service names for readability.
>
> (I reported the issue to SLES support)
>
> Regards,
> Ulrich
>
>
>
More information about the Users
mailing list