[Pacemaker] Resource monitoring actions when a resource diesuncleanly

Thu Jan 6 12:54:10 EST 2011

Your init script needs to be LSB compliant. See this link for details
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explai
ned/ap-lsb.html. Basically, in the case of stop, it always needs to
return 0.  You can change the squid init script or write an lsb
compliant wrapper init script that calls the squid init script.  Hope
this helps.

Mick

________________________________

From: Andrew Lacey [mailto:alacey at brynmawr.edu] 
Sent: Thursday, January 06, 2011 11:41 AM
To: pacemaker at oss.clusterlabs.org
Subject: [Pacemaker] Resource monitoring actions when a resource
diesuncleanly

Hi-

First off, I'm new to Pacemaker and there's a tremendous amount of
information to sift through, so my apologies if this has been answered
already.

I'm trying to set up a simple 2-node active/passive cluster that runs
squid (reverse proxy for web services) on a service IP address. I'm not
using STONITH because there's no shared data, so nothing horrible would
happen if squid somehow ends up running on both boxes. So, there are
just two resources, squid itself and the IP address, configured as a
resource group because they must be on the same machine.

I've done some investigation on setting up resource monitoring for
squid. Ideally, if squid dies for any reason on the currently-active
node, I would like to fail both resources (squid and IP) over to the
other node. For resource monitoring, there is an on-fail action called
"standby", which is described as: "Move all resources away from the node
on which the resource failed." That sounded to me like what I want, so I
tested it. Unfortunately, I found that if squid dies uncleanly
(simulated by issuing a kill -9 to its process), Pacemaker gets into an
infinite loop of repeatedly trying to use the init script to "stop"
squid. The init script is returning some error value because, in its
words, "squid is dead but pid file exists". squid is never started on
the other node because Pacemaker is never satisfied that it has truly
stopped on the original node.

Since a typical unexpected software failure would be an unclean failure
(seg fault or whatever), this monitoring doesn't seem very useful if it
always gets stuck trying to "stop" the crashed service before taking any
further action. Is there a generally-accepted way around this? Should
the init script (LSB) be rewritten to respond differently to this
situation, or is there some way to get Pacemaker to respond differently?

Thanks,

-Andrew L

This message is intended only for the personal and confidential use of the recipients named above.  If the reader of this email is not the intended recipient, you have received this email in error and any review, dissemination, distribution or copying is strictly prohibited.  If you have received this email in error, please notify the sender immediately by return email and permanently delete the copy you received.  This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.  Neither CTC Holdings nor any affiliates (CTC) are responsible for any recommendation, solicitation, offer or agreement or any information about any transaction, customer account or account activity that may be attached to or contained in this communication. CTC accepts no liability for any content contained in the email, or any errors or omissions arising as a result of e-mail transmission.  Any opinions contained in this email constitute the sender's best judgment at this time and are subject to change without notice. CTC London Limited is authorized and regulated by the Financial Services Authority.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110106/b3819e59/attachment-0001.html>