[ClusterLabs] Timeout stopping corosync-qdevice service

Jan Friesse jfriesse at redhat.com
Thu May 2 02:45:46 EDT 2019


Andrei,

> 30.04.2019 9:51, Jan Friesse пишет:
>>
>>> Now, corosync-qdevice gets SIGTERM as "signal to terminate", but it
>>> installs SIGTERM handler that does not exit and only closes some socket.
>>> May be this should trigger termination of main loop, but somehow it does
>>> not.
>>
>> Yep, this is exactly how qdevice daemon shutdown works. Signal just
>> closes socket (should be signal safe) and poll in main loop do its job
>> so main loop is terminated.
>>
> 
> That is bug in corosync 2.4.4 which is still used in TW. stop is using
> pidof, I have two corosync-qdevice processes so corosync-qdevice never
> gets signal in the first place.

Oh, that explains it.

> 
> 
> ++ pidof corosync-qdevice
> + kill -TERM '1812 1811'
> 
> Current git was changed to use PID file (although for different
> reasons), so bug should not be fixed here as side effect.

It's probably time for 2.4.5 release.

Anyway, thanks a lot for digging into the problem and finding solution!

Regards,
   Honza

> 
> commit 1965225e3e2728beb1f77bed2e8f14edb72fe586 (tag: v2.93.0)
> Author: Jan Friesse <jfriesse at redhat.com>
> Date:   Wed Nov 14 17:52:11 2018 +0100
> 
>      init: Fix init scripts to work with containers
> 
>      Previously init scripts were not using pid file so pidof was used. This
>      is usually not a problem, but when containers are used it may result to
>      killing improper instance when issued on host.
> 
>      Solution is to always use pidfile.
> 



More information about the Users mailing list