[ClusterLabs] Antw: Re: Can't do anything right; how do I start over?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Oct 17 09:12:47 CEST 2016
>>> Dimitri Maziuk <dmaziuk at bmrb.wisc.edu> schrieb am 15.10.2016 um 23:13 in
Nachricht <750d030a-ae3b-2c91-4275-8695c1a4cdc0 at bmrb.wisc.edu>:
> On 10/15/2016 12:27 PM, Dmitri Maziuk wrote:
>> On 2016-10-15 01:56, Jay Scott wrote:
>>
>>> So, what's wrong? (I'm a newbie, of course.)
>>
>> Here's what worked for me on centos 7:
>> http://octopus.bmrb.wisc.edu/dokuwiki/doku.php?id=sysadmin:pacemaker
>> YMMV and all that.
>
> PS. I can't in all honesty recommend this setup for running NFS clusters
> at this point.
>
> About 1 in 3 times I do 'pcs standby <primary>' I get
Have you tried a proper variant of "lsof" before? So maybe you know which process might block the device. I also think if you have LVM on top of DRBD, you must deactivate the VG before trying to unmount.
>
>> Oct 15 15:31:52 lionfish crmd[1137]: notice: Initiating action 46: stop
> drbd_filesystem_stop_0 on lionfish (local)
>> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: INFO: Running
> stop for /dev/drbd0 on /raid
>> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: INFO: Trying to
> unmount /raid
>> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't
> unmount /raid; trying cleanup with TERM
>> Oct 15 15:31:52 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No
> processes on /raid were signalled. force_unmount is set to 'yes'
>> Oct 15 15:31:53 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't
> unmount /raid; trying cleanup with TERM
>> Oct 15 15:31:53 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No
> processes on /raid were signalled. force_unmount is set to 'yes'
>> Oct 15 15:31:54 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't
> unmount /raid; trying cleanup with TERM
>> Oct 15 15:31:54 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No
> processes on /raid were signalled. force_unmount is set to 'yes'
>> Oct 15 15:31:56 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't
> unmount /raid; trying cleanup with KILL
>> Oct 15 15:31:56 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No
> processes on /raid were signalled. force_unmount is set to 'yes'
>> Oct 15 15:31:57 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't
> unmount /raid; trying cleanup with KILL
>> Oct 15 15:31:57 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No
> processes on /raid were signalled. force_unmount is set to 'yes'
>> Oct 15 15:31:58 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't
> unmount /raid; trying cleanup with KILL
>> Oct 15 15:31:58 lionfish Filesystem(drbd_filesystem)[32120]: INFO: No
> processes on /raid were signalled. force_unmount is set to 'yes'
>> Oct 15 15:31:59 lionfish Filesystem(drbd_filesystem)[32120]: ERROR: Couldn't
> unmount /raid, giving up!
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ (In some cases useful info
> about processes that use ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ the device is found by lsof(8)
> or fuser(1)) ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid;
> trying cleanup with TERM ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ (In some cases useful info
> about processes that use ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ the device is found by lsof(8)
> or fuser(1)) ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid;
> trying cleanup with TERM ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ (In some cases useful info
> about processes that use ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ the device is found by lsof(8)
> or fuser(1)) ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid;
> trying cleanup with TERM ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ (In some cases useful info
> about processes that use ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ the device is found by lsof(8)
> or fuser(1)) ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid;
> trying cleanup with KILL ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ (In some cases useful info
> about processes that use ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ the device is found by lsof(8)
> or fuser(1)) ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid;
> trying cleanup with KILL ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ umount: /raid: target is busy. ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ (In some cases useful info
> about processes that use ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ the device is found by lsof(8)
> or fuser(1)) ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid;
> trying cleanup with KILL ]
>> Oct 15 15:32:00 lionfish lrmd[1134]: notice:
> drbd_filesystem_stop_0:32120:stderr [ ocf-exit-reason:Couldn't unmount /raid,
> giving up! ]
>> Oct 15 15:32:00 lionfish crmd[1137]: notice: Operation
> drbd_filesystem_stop_0: unknown error (node=lionfish, call=91, rc=1,
> cib-update=107, confirmed=true)
>> Oct 15 15:32:00 lionfish crmd[1137]: notice:
> lionfish-drbd_filesystem_stop_0:91 [ umount: /raid: target is busy.\n
> (In some cases useful info about p
>> rocesses that use\n the device is found by lsof(8) or
> fuser(1))\nocf-exit-reason:Couldn't unmount /raid; trying cleanup with
> TERM\numount: /raid: tar
>> get is busy.\n (In some cases useful info about processes that use\n
> the device is found by lsof(8) or fuser(1))\nocf-exit-reason:Couldn't
> unm
>> ount /raid; trying cleanup with TERM\numount: /raid: target is busy.\n
>> Oct 15 15:32:00 lionfish crmd[1137]: warning: Action 46
> (drbd_filesystem_stop_0) on lionfish failed (target: 0 vs. rc: 1): Error
>> Oct 15 15:32:00 lionfish crmd[1137]: notice: Transition aborted by
> drbd_filesystem_stop_0 'modify' on lionfish: Event failed
> (magic=0:1;46:4:0:700f71e0-d565
>> -496f-a2c6-6b97f0cfd940, cib=0.128.10, source=match_graph_event:381, 0)
>
> and I have to take trip to the server room to power-cycle (aka stonith)
> the nodes.
>
> I haven't tried digging into it yet, for all I know the problem may be
> between the centos kernel and tainted elrepo drbd module -- "no
> processes were signalled" while "target is busy" may be a bug in the RA
> of course...
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
More information about the Users
mailing list