[Pacemaker] failure handling on a cloned resource

Thu May 2 21:04:54 EDT 2013

On 02/05/2013, at 5:45 PM, Johan Huysmans <johan.huysmans at inuits.be> wrote:

> 
> On 2013-05-01 05:48, Andrew Beekhof wrote:
>> On 17/04/2013, at 9:54 PM, Johan Huysmans <johan.huysmans at inuits.be> wrote:
>> 
>>> Hi All,
>>> 
>>> I'm trying to setup a specific configuration in our cluster, however I'm struggling with my configuration.
>>> 
>>> This is what I'm trying to achieve:
>>> On both nodes of the cluster a daemon must be running (tomcat).
>>> Some failover addresses are configured and must be running on the node with a correctly running tomcat.
>>> 
>>> I have this achieved with a cloned tomcat resource and an collocation between the cloned tomcat and the failover addresses.
>>> When I cause a failure in the tomcat on the node running the failover addresses, the failover addresses will failover to the other node as expected.
>>> crm_mon shows that this tomcat has a failure.
>>> When I configure the tomcat resource with failure-timeout=0, the failure alarm in crm_mon isn't cleared whenever the tomcat failure is fixed.
>> All sounds right so far.
> If my broken tomcat is automatically fixed, I expect this to be noticed by pacemaker and that that node will be able to run my failover addresses,
> however I don't see this happening.

This is very hard to discuss without seeing logs.

So you created a tomcat error, waited for pacemaker to notice, fixed the error and observed the pacemaker did not re-notice?
How long did you wait? More than the 15s repeat interval I assume?  Did at least the resource agent notice?

>> 
>>> When I configure the tomcat resource with failure-timeout=30, the failure alarm in crm_mon is cleared after 30seconds however the tomcat is still having a failure.
>> Can you define "still having a failure"?
>> You mean it still shows up in crm_mon?
>> Have you read this link?
>>    http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-rules-recheck.html
> "Still having a failure" means that the tomcat is still broken and my OCF script reports it as a failure.
>> 
>>> What I expect is that pacemaker reports the failure as the failure exists and as long as it exists and that pacemaker reports that everything is ok once everything is back ok.
>>> 
>>> Do I do something wrong with my configuration?
>>> Or how can I achieve my wanted setup?
>>> 
>>> Here is my configuration:
>>> 
>>> node CSE-1
>>> node CSE-2
>>> primitive d_tomcat ocf:custom:tomcat \
>>>    op monitor interval="15s" timeout="510s" on-fail="block" \
>>>    op start interval="0" timeout="510s" \
>>>    params instance_name="NMS" monitor_use_ssl="no" monitor_urls="/cse/health" monitor_timeout="120" \
>>>    meta migration-threshold="1" failure-timeout="0"
>>> primitive ip_1 ocf:heartbeat:IPaddr2 \
>>>    op monitor interval="10s" \
>>>    params nic="bond0" broadcast="10.1.1.1" iflabel="ha" ip="10.1.1.1"
>>> primitive ip_2 ocf:heartbeat:IPaddr2 \
>>>    op monitor interval="10s" \
>>>    params nic="bond0" broadcast="10.1.1.2" iflabel="ha" ip="10.1.1.2"
>>> group svc-cse ip_1 ip_2
>>> clone cl_tomcat d_tomcat
>>> colocation colo_tomcat inf: svc-cse cl_tomcat
>>> order order_tomcat inf: cl_tomcat svc-cse
>>> property $id="cib-bootstrap-options" \
>>>    dc-version="1.1.8-7.el6-394e906" \
>>>    cluster-infrastructure="cman" \
>>>    no-quorum-policy="ignore" \
>>>    stonith-enabled="false"
>>> 
>>> Thanks!
>>> 
>>> Greetings,
>>> Johan Huysmans
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org