[Pacemaker] Filesystem resource killing innocent processes on stop

emmanuel segura emi2fast at gmail.com
Mon May 18 08:26:22 EDT 2015


are you sure you process are not in working directory /home/cluster/virt ?

I'm using suse 11 Sp2 and I don't know if the agent is the same in
redhat 6, but i think so, anyway for umounting the fs the script uses
the following functions Filesystem_stop -> fs_stop -> signal_processes

In the fs_stop function, the cluster try to kill the process that
using the fs with TERM signal

fs_stop() {
        local SUB=$1 timeout=$2 sig cnt
        for sig in TERM KILL; do
                cnt=$((timeout/2)) # try half time with TERM
                while [ $cnt -gt 0 ]; do
                        try_umount $SUB &&
                                return $OCF_SUCCESS
                        ocf_log err "Couldn't unmount $SUB; trying
cleanup with $sig"
                        signal_processes $SUB $sig
                        cnt=$((cnt-1))
                        sleep 1
                done
        done
        return $OCF_ERR_GENERIC
}

In function signal_processes, the cluster uses fuser to kill the process

signal_processes() {
        local dir=$1
        local sig=$2
        # fuser returns a non-zero return code if none of the
        # specified files is accessed or in case of a fatal
        # error.
        if [ "X${HOSTOS}" = "XOpenBSD" ];then
                PIDS=`fstat | grep $dir | awk '{print $3}'`
                for PID in ${PIDS};do
                        kill -s $sig ${PID}
                        ocf_log info "Sent signal $sig to ${PID}"
                done
        else
                if $FUSER -$sig -m -k $dir ; then
                        ocf_log info "Some processes on $dir were signalled"
                else
                        ocf_log info "No processes on $dir were signalled"
                fi
        fi
}

2015-05-18 12:20 GMT+02:00 Nikola Ciprich <nikola.ciprich at linuxbox.cz>:
> Hi,
>
> I noticed very annoying bug (or so I think), that resource-agents-3.9.5
> in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
> unrelated processes on shutdown although they're not using anything on mounted filesystem...
>
> unfortunately, one of processes very often killed is sshd :-(
>
> here's example of the log:
>
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      3606     1  0 Feb12 ?        S<s    0:01 /sbin/udevd -d
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4249     1  0 Feb12 ttyS2    Ss+    0:00 agetty ttyS2 115200 vt100
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4271  4395  0 21:58 ?        Ss     0:00 sshd: root at pts/12
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4273     1  0 21:58 ?        Rs     0:00 [bash]
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4395     1  0 Feb24 ?        Ss     0:03 /usr/sbin/sshd
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4677     1  0 Feb12 ?        Ss     0:00 /sbin/portreserve
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4690     1  0 Feb12 ?        S      0:00 supervising syslog-ng
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4691     1  0 Feb12 ?        Ss     0:46 syslog-ng -p /var/run/syslog-ng.pid
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: rpc       4746     1  0 Feb12 ?        Ss     0:05 rpcbind
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser   4764     1  0 Feb12 ?        Ss     0:00 rpc.statd
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4797     1  0 Feb12 ?        Ss     0:00 rpc.idmapd
> Filesystem(virt-fs)[4803]:      2015/05/17_21:59:48 INFO: sending signal TERM to: root      4803 12028  0 21:59 ?        S      0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
>
> while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be
> Filesystem resource itself..
>
> before I dig deeper into this, did anyone else noticed this problem? Is this some known
> (and possibly already issue)?
>
> thanks a lot in advance
>
> nik
>
>
> --
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis at linuxbox.cz
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^




More information about the Pacemaker mailing list