[ClusterLabs] proftpd resource agent - fix for a start/monitor race condition
Dejan Muhamedagic
dejanmm at fastmail.fm
Wed Mar 25 13:26:04 UTC 2015
Hi,
On Wed, Mar 25, 2015 at 11:40:32AM +0100, Matthias Ferdinand wrote:
> Hello,
>
> the proftpd resource agent sometimes shows a race condition:
>
> if startup of the proftpd binary is slow, the pacemaker monitor
> operation immediately following the start operation may not yet find
> the pid-file from proftpd, and then it will signal failure. Subsequent
> retries of the start operation then keep failing because the tcp sockets
> are already used by the initial proftpd (which was never stopped).
Yes, that's a common issue with all servers that run as daemons.
> Fix (copied from the apache resource agent): after invoking the proftpd
> binary, do not return to caller until the monitor operation (called
> from within the RA itself) shows "success". Handling startup timeouts is
> left to the cluster manager.
Very good. More below.
> Regards
> Matthias Ferdinand
> --
> one4vision GmbH Fon +49 681 96727 - 60
> Residenz am Schlossgarten Fax +49 681 96727 - 69
> Talstraße 34-42 info at one4vision.de
> D-66119 Saarbrücken http://www.one4vision.de
> HRB 11751 verantwortl. Geschäftsführer:
> Amtsgericht Saarbrücken Christof Allmann, Christoph Harth
> --- 20150226_usr_lib_ocf_resource.d_heartbeat_proftpd 2015-02-26 17:39:19.956590821 +0100
> +++ patched_proftpd 2015-02-26 17:51:06.027695989 +0100
> @@ -163,7 +163,25 @@
> exit $OCF_ERR_GENERIC
> fi
>
> - exit $OCF_SUCCESS
> + tries=0
> + while : # wait until the user set timeout
> + do
> + proftpd_monitor
> + ec=$?
Limit scope of ec (add "local ec", somewhere above).
> + if [ $ec -eq $OCF_NOT_RUNNING ]
> + then
> + tries=`expr $tries + 1`
You can drop the tries variable.
> + ocf_log info "waiting for proftpd ${OCF_RESKEY_conffile} to come up"
> + sleep 1
> + else
> + break
> + fi
> + done
> +
> + if [ $ec -ne 0 ]; then
> + proftpd_stop
I'd remove this. The cluster manager should try to stop the
resource in case a start operation fails.
Cheers,
Dejan
> + fi
> + return $ec
> }
>
>
> @@ -264,6 +282,7 @@
> case $1 in
> start) proftpd_validate_all
> proftpd_start
> + exit $?
> ;;
>
> stop) proftpd_stop
> @@ -298,4 +317,3 @@
> exit $OCF_ERR_UNIMPLEMENTED
> ;;
> esac
> -
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list