[ClusterLabs] centos 7 drbd fubar

Dimitri Maziuk dmaziuk at bmrb.wisc.edu
Tue Dec 27 16:08:40 EST 2016


I ran centos 7.3.1611 update over the holidays and my drbd + nfs + imap
active-passive pair locked up again. This has now been consistent for at
least 3 kernel updates. This time I had enough consoles open to run
fuser & lsof though.

The procedure:

1. pcs cluster standby <secondary>
2. yum up && reboot <secondary>
3. pcs cluster unstandby <secondary>

Fine so far.

4. pcs cluster standby <primary>
results in

> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: Running stop for /dev/drbd0 on /raid
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: Trying to unmount /raid
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 ERROR: Couldn't unmount /raid; trying cleanup with TERM
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:42 ERROR: Couldn't unmount /raid; trying cleanup with TERM
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:42 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:43 ERROR: Couldn't unmount /raid; trying cleanup with TERM
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:43 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:44 ERROR: Couldn't unmount /raid; trying cleanup with KILL
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:44 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:45 ERROR: Couldn't unmount /raid; trying cleanup with KILL
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:46 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:47 ERROR: Couldn't unmount /raid; trying cleanup with KILL
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:47 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:48 ERROR: Couldn't unmount /raid, giving up!
> Dec 23 17:36:48 [1138] zebrafish.bmrb.wisc.edu       lrmd:   notice: operation_finished:        drbd_filesystem_stop_0:18277:stderr [ umount: /raid: target i
> s busy. ]

... until the system's powered down. Before power down I ran lsof, it
hung, and fuser:

> # fuser -vum /raid
>                      USER        PID ACCESS COMMAND
> /raid:               root     kernel mount (root)/raid

After running yum up on the primary and rebooting it again,

5. pcs cluster unstandby <primary>
causes the same fail to unmount loop on the secondary, that has to be
powered down until the primary recovers.

Hopefully I'm doing something wrong, please someone tell me what it is.
Anyone? Bueller?
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: OpenPGP digital signature
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161227/6b7b8d2e/attachment-0002.sig>


More information about the Users mailing list