[ClusterLabs] Antw: Heads up for ldirectord in SLES12 SP5 "Use of uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord line 1830"

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 8 05:19:20 EDT 2022


Hi!

The bug is still under investigation, but digging in the ldirectord code I found this part called when stopping:

                } elsif ($CMD eq "stop") {
                        kill 15, $oldpid;
                        ld_exit(0, "Exiting from ldirectord $CMD");

As ldirectord uses a SIGTERM handler that sets a flag only and then (at some later time) the termination code will be started.
Doesn't that mean the cluster will see a bad exit code (success while parts of ldirectord are still running)?

Regards,
Ulrich



>>> Ulrich Windl schrieb am 03.08.2022 um 11:13 in Nachricht <62EA3C2C.E8D : 161 :
60728>:
> Hi!
> 
> I wanted to inform you of an unpleasant bug in ldirectord of SLES12 SP5:
> We had a short network problem while some redundancy paths reconfigured in 
> the infrastructure, effectively causing that some network services could not 
> be reached.
> Unfortunately ldirectord controlled by the cluster reported a failure (the 
> director, not the services being directed to):
> 
> h11 crmd[28930]:   notice: h11-prm_lvs_mail_monitor_300000:369 [ Use of 
> uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord 
> line 1830, <CFGFILE> line 21. Error [33159] reading file 
> /etc/ldirectord/mail.conf at line 10: invalid address for virtual service\n ]
> h11 ldirectord[33266]: Exiting with exit_status 2: config_error: 
> Configuration Error
> 
> You can guess wat happened:
> Pacemaker tried to recover (stop, then start), but the stop failed, too:
> h11 lrmd[28927]:   notice: prm_lvs_mail_stop_0:35047:stderr [ Use of 
> uninitialized value $ip_port in pattern match (m//) at /usr/sbin/ldirectord 
> line 1830, <CFGFILE> line 21. ]
> h11 lrmd[28927]:   notice: prm_lvs_mail_stop_0:35047:stderr [ Error [36293] 
> reading file /etc/ldirectord/mail.conf at line 10: invalid address for 
> virtual service ]
> h11 crmd[28930]:   notice: Result of stop operation for prm_lvs_mail on h11: 
> 1 (unknown error)
> 
> A stop failure meant that the node was fenced, interrupting all the other 
> services.
> 
> Examining the logs I also found this interesting type of error:
> h11 attrd[28928]:   notice: Cannot update 
> fail-count-prm_lvs_rksapds5#monitor_300000[monitor]=(null) because peer UUID 
> not known (will retry if learned)
> 
> Eventually, here's the code that caused the error:
> 
> sub _ld_read_config_virtual_resolve
> {
>         my($line, $vsrv, $ip_port, $af)=(@_);
> 
>         if($ip_port){
>                 $ip_port=&ld_gethostservbyname($ip_port, $vsrv->{protocol}, 
> $af);
>                 if ($ip_port =~ /(\[[0-9A-Fa-f:]+\]):(\d+)/) {
>                         $vsrv->{server} = $1;
>                         $vsrv->{port} = $2;
>                 } elsif($ip_port){
>                         ($vsrv->{server}, $vsrv->{port}) = split /:/, 
> $ip_port;
>                 }
>                 else {
>                         &config_error($line,
>                                 "invalid address for virtual service");
>                 }
> ...
> 
> The value returned by ld_gethostservbyname is undefined. I also wonder what 
> the program logic is:
> If the host looks like an hex address in square brackets, host and port are 
> split at the colon; otherwise host and port are split at the colon.
> Why not split simply at the last colon if the value is defined, AND THEN 
> check if the components look OK?
> 
> So the "invalid address for virtual service" is only invalid when the 
> resolver service (e.g. via LDAP) is unavailable.
> I used host and service names for readability.
> 
> (I reported the issue to SLES support)
> 
> Regards,
> Ulrich
> 
> 
> 






More information about the Users mailing list