[ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node
Valentin Vidic
Valentin.Vidic at CARNet.hr
Tue Apr 26 19:57:06 UTC 2016
On Fri, Jan 22, 2016 at 07:57:52PM +0300, Vladislav Bogdanov wrote:
> Tried reverting this one and a51b2bb ("If an error occurs unlink the
> lock file and exit with status 1") one-by-one and both together, the
> same result.
>
> So problem seems to be somewhere deeper.
I've got the same fencing problem with dlm-4.0.4 on Debian. Looking
at the strace of the dlm_controld process it exits right after returning
from the poll call due to SIGCHLD signal:
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2279, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn() = -1 EINTR (Interrupted system call)
close(11) = 0
sendto(10, "\240", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(17, "\20", 1, MSG_NOSIGNAL, NULL, 0) = 1
poll([{fd=17, events=POLLIN}], 1, 0) = 0 (Timeout)
shutdown(17, SHUT_RDWR) = 0
close(17) = 0
munmap(0x7f5f45c26000, 2105344) = 0
munmap(0x7f5f4aeea000, 8248) = 0
munmap(0x7f5f45a24000, 2105344) = 0
munmap(0x7f5f4aee7000, 8248) = 0
munmap(0x7f5f45822000, 2105344) = 0
and in fact there is a recent change in 4.0.4 modifying that part
of code:
If an error occurs unlink the lock file and exit with status 1
https://git.fedorahosted.org/cgit/dlm.git/commit/?id=a51b2bbe413222829778698e62af88a73ebec233
The bug is caused by the missing braces in the expanded if
statement.
Do you think we can get a new version out with this patch as the
fencing in 4.0.4 does not work properly due to this issue?
--
Valentin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fix-exit-on-fence.patch
Type: text/x-diff
Size: 558 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160426/bba7c1fe/attachment-0003.bin>
More information about the Users
mailing list