[ClusterLabs] Resource failure-timeout does not reset when resource fails to connect to both nodes
SGardner at trustwave.com
Mon Mar 28 12:44:30 EDT 2016
I have a simple resource defined:
[root at ha-d1 ~]# pcs resource show dmz1
Resource: dmz1 (class=ocf provider=internal type=ip-address)
Attributes: address=172.16.10.192 monitor_link=true
Meta Attrs: migration-threshold=3 failure-timeout=30s
Operations: monitor interval=7s (dmz1-monitor-interval-7s)
This is a custom resource which provides an ethernet alias to one of the interfaces on our system.
I can unplug the cable on either node and failover occurs as expected, and 30s after re-plugging it I can repeat the exercise on the opposite node and failover will happen as expected.
However, if I unplug the cable from both nodes, the failcount goes up, and the 30s failure-timeout does not reset the failcounts, meaning that pacemaker never tries to start the failed resource again.
Full list of resources:
Resource Group: network
inif (off::internal:ip.sh): Started ha-d1.dev.com
outif (off::internal:ip.sh): Started ha-d2.dev.com
dmz1 (off::internal:ip.sh): Stopped
Master/Slave Set: DRBDMaster [DRBDSlave]
Masters: [ ha-d1.dev.com ]
Slaves: [ ha-d2.dev.com ]
Resource Group: filesystem
DRBDFS (ocf::heartbeat:Filesystem): Stopped
Resource Group: application
service_failover (off::internal:service_failover): Stopped
Failcounts for dmz1
Is there any way to automatically recover from this scenario, other than setting an obnoxiously high migration-threshold?
Trustwave | SMART SECURITY ON DEMAND
This transmission may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users