[ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

Valentin Vidic Valentin.Vidic at CARNet.hr
Tue Apr 26 15:57:06 EDT 2016


On Fri, Jan 22, 2016 at 07:57:52PM +0300, Vladislav Bogdanov wrote:
> Tried reverting this one and a51b2bb ("If an error occurs unlink the 
> lock file and exit with status 1") one-by-one and both together, the 
> same result.
> 
> So problem seems to be somewhere deeper.

I've got the same fencing problem with dlm-4.0.4 on Debian.  Looking
at the strace of the dlm_controld process it exits right after returning
from the poll call due to SIGCHLD signal:

wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = 0 (Timeout)
wait4(2279, 0x7ffd2f468afc, WNOHANG, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}], 10, 1000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2279, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
close(11)                               = 0
sendto(10, "\240", 1, MSG_NOSIGNAL, NULL, 0) = 1
sendto(17, "\20", 1, MSG_NOSIGNAL, NULL, 0) = 1
poll([{fd=17, events=POLLIN}], 1, 0)    = 0 (Timeout)
shutdown(17, SHUT_RDWR)                 = 0
close(17)                               = 0
munmap(0x7f5f45c26000, 2105344)         = 0
munmap(0x7f5f4aeea000, 8248)            = 0
munmap(0x7f5f45a24000, 2105344)         = 0
munmap(0x7f5f4aee7000, 8248)            = 0
munmap(0x7f5f45822000, 2105344)         = 0

and in fact there is a recent change in 4.0.4 modifying that part
of code:

  If an error occurs unlink the lock file and exit with status 1
  https://git.fedorahosted.org/cgit/dlm.git/commit/?id=a51b2bbe413222829778698e62af88a73ebec233

The bug is caused by the missing braces in the expanded if
statement.

Do you think we can get a new version out with this patch as the
fencing in 4.0.4 does not work properly due to this issue?

-- 
Valentin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fix-exit-on-fence.patch
Type: text/x-diff
Size: 558 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20160426/bba7c1fe/attachment-0002.bin>


More information about the Users mailing list