[Pacemaker] Trouble with ocf:Squid resource agent
Jake Smith
jsmith at argotec.com
Mon Jul 30 12:09:10 EDT 2012
----- Original Message -----
> From: "Julien Cornuwel" <cornuwel at gmail.com>
> To: pacemaker at oss.clusterlabs.org
> Sent: Wednesday, July 25, 2012 5:51:28 AM
> Subject: Re: [Pacemaker] Trouble with ocf:Squid resource agent
>
> Oops! Spoke too fast. The fix below allows squid to start. But the
> script also has problems in the 'stop' part. It is stuck in an
> infinite loop and here are the logs (repeats every second) :
>
> Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> 320: kill: -: arguments must be process or job IDs
> Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> 320: kill: -: arguments must be process or job IDs
> Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> squid:stop_squid:318: try to stop by SIGKILL: -
> Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> squid:stop_squid:318: try to stop by SIGKILL: -
>
> Being on a deadline, I'll use the lsb script for the moment. If
> someone figures out how to use this ocf script, I'm very interrested.
>
I took a quick look at the OCF... here's the stop section with inline comments from me (###)
stop_squid()
{
typeset lapse_sec
if ocf_run $SQUID_EXE -f $SQUID_CONF -k shutdown; then
lapse_sec=0
while true; do
get_pids
if is_squid_dead; then
rm -f $SQUID_PIDFILE
return $OCF_SUCCESS
fi
(( lapse_sec = lapse_sec + 1 ))
if (( lapse_sec > SQUID_STOP_TIMEOUT )); then
### looks to me like you're hitting the line above which then breaks out and drops down to the "while true" 8 lines down. I would time a manual stop of squid (I know it takes quite awhile) and make sure you're primitive's "op stop interval="0" timeout="120s"" is set high enough (definately more than 120s I would assume) that the elapsed time to stop squid doesn't normally exceed the timeout value.
break
fi
sleep 1
ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
"stop NORM $lapse_sec/$SQUID_STOP_TIMEOUT"
done
fi
while true; do
get_pids
ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
"try to stop by SIGKILL:${SQUID_PIDS[0]} ${SQUID_PIDS[2]}"
kill -KILL ${SQUID_PIDS[0]} ${SQUID_PIDS[2]}
### have you tried manually running the above line and see what you get (inserting the correct PID's of course)? Maybe the kill -KILL syntax is invalid for your flavor of linux and the OCF needs to be updated to take that into account when running the kill command? Even if you increase the timeout above to a normally reasonable value you still want it to be able to kill it if it is unresponsive!
sleep 1
if is_squid_dead; then
rm -f $SQUID_PIDFILE
return $OCF_SUCCESS
fi
done
return $OCF_ERR_GENERIC
}
> Regards
>
>
> 2012/7/24 Julien Cornuwel <cornuwel at gmail.com>:
> > Hi,
> >
> > Fixed! The problem comes from the squid ocf script
> > (/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
> > addresses correctly.
> > All you have to do is modify the line 198 as such :
> > awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
> > |tcp.*:::'$SQUID_PORT' )/{
> >
> > Source:
> > http://www.n3oxid.fr/index.php?post/2012/04/07/Installation-et-configuration-d-un-cluster-Pacemaker/CoroSync-sous-GNU/Linux-Debian-6-%28Squeeze%29
> >
Not sure if the above fully patches the OCF for squid ipv4 and ipv6 but I would recommend submitting a patch against the resource agent so in the future it just works ;-)
HTH
Jake
More information about the Pacemaker
mailing list