[Pacemaker] Trouble with ocf:Squid resource agent

Jake Smith jsmith at argotec.com
Mon Jul 30 12:09:10 EDT 2012


----- Original Message -----
> From: "Julien Cornuwel" <cornuwel at gmail.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Wednesday, July 25, 2012 5:51:28 AM
> Subject: Re: [Pacemaker] Trouble with ocf:Squid resource agent
> 
> Oops! Spoke too fast. The fix below allows squid to start. But the
> script also has problems in the 'stop' part. It is stuck in an
> infinite loop and here are the logs (repeats every second) :
> 
> Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> 320: kill: -: arguments must be process or job IDs
> Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> 320: kill: -: arguments must be process or job IDs
> Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> squid:stop_squid:318:  try to stop by SIGKILL: -
> Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> squid:stop_squid:318:  try to stop by SIGKILL: -
> 
> Being on a deadline, I'll use the lsb script for the moment. If
> someone figures out how to use this ocf script, I'm very interrested.
> 

I took a quick look at the OCF... here's the stop section with inline comments from me (###)

stop_squid()
{
	typeset lapse_sec

	if ocf_run $SQUID_EXE -f $SQUID_CONF -k shutdown; then
		lapse_sec=0
		while true; do
			get_pids
			if is_squid_dead; then
				rm -f $SQUID_PIDFILE
				return $OCF_SUCCESS
			fi
			(( lapse_sec = lapse_sec + 1 ))
			if (( lapse_sec > SQUID_STOP_TIMEOUT )); then

### looks to me like you're hitting the line above which then breaks out and drops down to the "while true" 8 lines down.  I would time a manual stop of squid (I know it takes quite awhile) and make sure you're primitive's "op stop interval="0" timeout="120s"" is set high enough (definately more than 120s I would assume) that the elapsed time to stop squid doesn't normally exceed the timeout value.

				break
			fi
			sleep 1
			ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
				"stop NORM $lapse_sec/$SQUID_STOP_TIMEOUT"
		done
	fi

	while true; do
		get_pids
		ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
			"try to stop by SIGKILL:${SQUID_PIDS[0]} ${SQUID_PIDS[2]}"
		kill -KILL ${SQUID_PIDS[0]} ${SQUID_PIDS[2]}

### have you tried manually running the above line and see what you get (inserting the correct PID's of course)?  Maybe the kill -KILL syntax is invalid for your flavor of linux and the OCF needs to be updated to take that into account when running the kill command?  Even if you increase the timeout above to a normally reasonable value you still want it to be able to kill it if it is unresponsive!

		sleep 1
		if is_squid_dead; then
			rm -f $SQUID_PIDFILE
			return $OCF_SUCCESS
		fi
	done

	return $OCF_ERR_GENERIC
}


> Regards
> 
> 
> 2012/7/24 Julien Cornuwel <cornuwel at gmail.com>:
> > Hi,
> >
> > Fixed! The problem comes from the squid ocf script
> > (/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
> > addresses correctly.
> > All you have to do is modify the line 198 as such :
> > awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
> > |tcp.*:::'$SQUID_PORT' )/{
> >
> > Source:
> > http://www.n3oxid.fr/index.php?post/2012/04/07/Installation-et-configuration-d-un-cluster-Pacemaker/CoroSync-sous-GNU/Linux-Debian-6-%28Squeeze%29
> >

Not sure if the above fully patches the OCF for squid ipv4 and ipv6 but I would recommend submitting a patch against the resource agent so in the future it just works ;-)

HTH
Jake




More information about the Pacemaker mailing list