[ClusterLabs] centos 7 drbd fubar

Fri Jan 6 20:41:34 CET 2017

On 12/27/2016 03:08 PM, Dimitri Maziuk wrote:
> I ran centos 7.3.1611 update over the holidays and my drbd + nfs + imap
> active-passive pair locked up again. This has now been consistent for at
> least 3 kernel updates. This time I had enough consoles open to run
> fuser & lsof though.
> 
> The procedure:
> 
> 1. pcs cluster standby <secondary>
> 2. yum up && reboot <secondary>
> 3. pcs cluster unstandby <secondary>
> 
> Fine so far.
> 
> 4. pcs cluster standby <primary>
> results in
> 
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: Running stop for /dev/drbd0 on /raid
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: Trying to unmount /raid
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 ERROR: Couldn't unmount /raid; trying cleanup with TERM
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:41 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:42 ERROR: Couldn't unmount /raid; trying cleanup with TERM
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:42 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:43 ERROR: Couldn't unmount /raid; trying cleanup with TERM
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:43 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:44 ERROR: Couldn't unmount /raid; trying cleanup with KILL
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:44 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:45 ERROR: Couldn't unmount /raid; trying cleanup with KILL
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:46 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:47 ERROR: Couldn't unmount /raid; trying cleanup with KILL
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:47 INFO: No processes on /raid were signalled. force_unmount is set to 'yes'
>> Filesystem(drbd_filesystem)[18277]:     2016/12/23_17:36:48 ERROR: Couldn't unmount /raid, giving up!
>> Dec 23 17:36:48 [1138] zebrafish.bmrb.wisc.edu       lrmd:   notice: operation_finished:        drbd_filesystem_stop_0:18277:stderr [ umount: /raid: target i
>> s busy. ]
> 
> ... until the system's powered down. Before power down I ran lsof, it
> hung, and fuser:
> 
>> # fuser -vum /raid
>>                      USER        PID ACCESS COMMAND
>> /raid:               root     kernel mount (root)/raid
> 
> After running yum up on the primary and rebooting it again,
> 
> 5. pcs cluster unstandby <primary>
> causes the same fail to unmount loop on the secondary, that has to be
> powered down until the primary recovers.
> 
> Hopefully I'm doing something wrong, please someone tell me what it is.
> Anyone? Bueller?

That is disconcerting. Since no one here seems to know, have you tried
asking on the drbd list? It sounds like an issue with the drbd kernel
module.

http://lists.linbit.com/listinfo/drbd-user