[Pacemaker] Filesystem resource killing innocent processes on stop

Nikola Ciprich nikola.ciprich at linuxbox.cz
Mon May 18 11:14:14 EDT 2015


Hi Dejan,

> 
> The list below seems too extensive.  Which version of
> resource-agents do you run?
> 
> $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs

yes, it's definitely wrong..

here's the info you've requested:

# Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7

rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64

I can already see the problem, this version simply uses
fuser -m $MOUNTPOINT which seems to return pretty wrong results:

[root at denovav1b ~]# fuser -m /home/cluster/virt/
/home/cluster/virt/:     1m  3295m  3314m  4817m  4846m  4847m  4890m  4891m  4916m  4944m  4952m  4999m  5007m  5037m  5069m  5137m  5162m  5164m  5166m  5168m  5170m  5172m  5575m  8055m  9604m  9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m

(notice even process # 1!) while lsof returns:

lsof | grep "cluster.*virt"
qemu-syst  8055      root   21r      REG                0,0  232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso

which seems much saner to me..

BR

nik


> 
> > here's example of the log:
> > 
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      3606     1  0 Feb12 ?        S<s    0:01 /sbin/udevd -d
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4249     1  0 Feb12 ttyS2    Ss+    0:00 agetty ttyS2 115200 vt100
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4271  4395  0 21:58 ?        Ss     0:00 sshd: root at pts/12
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4273     1  0 21:58 ?        Rs     0:00 [bash]
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4395     1  0 Feb24 ?        Ss     0:03 /usr/sbin/sshd
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4677     1  0 Feb12 ?        Ss     0:00 /sbin/portreserve
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4690     1  0 Feb12 ?        S      0:00 supervising syslog-ng
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4691     1  0 Feb12 ?        Ss     0:46 syslog-ng -p /var/run/syslog-ng.pid
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: rpc       4746     1  0 Feb12 ?        Ss     0:05 rpcbind
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser   4764     1  0 Feb12 ?        Ss     0:00 rpc.statd
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4797     1  0 Feb12 ?        Ss     0:00 rpc.idmapd
> > Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4803 12028  0 21:59 ?        S      0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
> > 
> > while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
> > Filesystem resource itself..
> 
> Hmm, that's quite strange. That implies that the RA script itself
> had /home/cluster/virt as its WD.
> 
> > before I dig deeper into this, did anyone else noticed this problem? Is this some known
> > (and possibly already issue)?
> 
> Never heard of this.
> 
> Thanks,
> 
> Dejan
> 
> > thanks a lot in advance
> > 
> > nik
> > 
> > 
> > -- 
> > -------------------------------------
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:    +420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> > 
> > mobil servis: +420 737 238 656
> > email servis: servis at linuxbox.cz
> > -------------------------------------
> 
> 
> 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20150518/bbdb9902/attachment-0003.sig>


More information about the Pacemaker mailing list