[Pacemaker] Resource monitoring actions when a resource diesuncleanly

Andrew Lacey alacey at brynmawr.edu
Thu Jan 6 13:56:12 EST 2011


Thanks, that is very helpful information. Looks like I need to modify the init script. 

-Andrew L 


From: "Michael Hittesdorf" <michael.hittesdorf at chicagotrading.com> 
To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org> 
Sent: Thursday, January 6, 2011 12:54:10 PM 
Subject: Re: [Pacemaker] Resource monitoring actions when a resource diesuncleanly 




Your init script needs to be LSB compliant. See this link for details http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html . Basically, in the case of stop, it always needs to return 0. You can change the squid init script or write an lsb compliant wrapper init script that calls the squid init script. Hope this helps. 



Mick 






From: Andrew Lacey [mailto:alacey at brynmawr.edu] 
Sent: Thursday, January 06, 2011 11:41 AM 
To: pacemaker at oss.clusterlabs.org 
Subject: [Pacemaker] Resource monitoring actions when a resource diesuncleanly 




Hi- 

First off, I'm new to Pacemaker and there's a tremendous amount of information to sift through, so my apologies if this has been answered already. 

I'm trying to set up a simple 2-node active/passive cluster that runs squid (reverse proxy for web services) on a service IP address. I'm not using STONITH because there's no shared data, so nothing horrible would happen if squid somehow ends up running on both boxes. So, there are just two resources, squid itself and the IP address, configured as a resource group because they must be on the same machine. 

I've done some investigation on setting up resource monitoring for squid. Ideally, if squid dies for any reason on the currently-active node, I would like to fail both resources (squid and IP) over to the other node. For resource monitoring, there is an on-fail action called "standby", which is described as: "Move all resources away from the node on which the resource failed." That sounded to me like what I want, so I tested it. Unfortunately, I found that if squid dies uncleanly (simulated by issuing a kill -9 to its process), Pacemaker gets into an infinite loop of repeatedly trying to use the init script to "stop" squid. The init script is returning some error value because, in its words, "squid is dead but pid file exists". squid is never started on the other node because Pacemaker is never satisfied that it has truly stopped on the original node. 

Since a typical unexpected software failure would be an unclean failure (seg fault or whatever), this monitoring doesn't seem very useful if it always gets stuck trying to "stop" the crashed service before taking any further action. Is there a generally-accepted way around this? Should the init script (LSB) be rewritten to respond differently to this situation, or is there some way to get Pacemaker to respond differently? 

Thanks, 

-Andrew L 



This message is intended only for the personal and confidential use of the recipients named above. If the reader of this email is not the intended recipient, you have received this email in error and any review, dissemination, distribution or copying is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and permanently delete the copy you received. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments. Neither CTC Holdings nor any affiliates (CTC) are responsible for any recommendation, solicitation, offer or agreement or any information about any transaction, customer account or account activity that may be attached to or contained in this communication. CTC accepts no liability for any content contained in the email, or any errors or omissions arising as a result of e-mail transmission. Any opinions contained in this email constitute the sender's best judgment at this time and are subject to change without notice. CTC London Limited is authorized and regulated by the Financial Services Authority. 
_______________________________________________ 
Pacemaker mailing list: Pacemaker at oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110106/ddd1329a/attachment-0001.html>


More information about the Pacemaker mailing list