[ClusterLabs] Set "start-failure-is-fatal=false" on only one resource?

emmanuel segura emi2fast at gmail.com
Fri Mar 25 11:00:28 EDT 2016

If you don't want INFINITY after the node is begin rebooted, you can
use failure timeout, you are using drbd with ms resource, you need to
configured drbd to use pacemaker fence handler.

2016-03-25 15:27 GMT+01:00 Sam Gardner <SGardner at trustwave.com>:
> on-fail=restart doesn't appear to do anything - the DRBDSlave resource
> failcount is still at INFINITY after the secondary node is rebooted:
> Is there anything else that I've screwed up in the config somehow?
> Migration threshold doesn't seem to have a ton of meaning in the sense of
> a Slave resource; it does not seem appropriate to try to swap the roles of
> a DRBD resource pair if the slave doesn't come up.
> [root at ha-d1 ~]# pcs resource failcount show DRBDSlave
> Failcounts for DRBDSlave
>  ha-d2.dev.com: INFINITY
> [root at ha-d1 ~]# pcs property list --all
> Cluster Properties:
> ...
> start-failure-is-fatal: true
>  startup-fencing: true
> ...
> [root at ha-d1 ~]# pcs resource show --full
> Master: DRBDMaster
>  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true failure-timeout=20s
>  Resource: DRBDSlave (class=ocf provider=linbit type=drbd)
>   Attributes: drbd_resource=wwwdata
>   Meta Attrs: failure-timeout=33s
>   Operations: monitor interval=11s (DRBDSlave-monitor-interval-11s
>               start interval=0s on-fail=restart
> (DRBDSlave-start-on-fail-restart)
>               monitor interval=13s role=Master
> (DRBDSlave-monitor-interval-13s)
> --
> Sam Gardner
> On 3/25/16, 2:46 AM, "emmanuel segura" <emi2fast at gmail.com> wrote:
>>try to use on-fail for single resource.
>>2016-03-25 0:22 GMT+01:00 Adam Spiers <aspiers at suse.com>:
>>> Sam Gardner <SGardner at trustwave.com> wrote:
>>>> I'm having some trouble on a few of my clusters in which the DRBD
>>>>Slave resource does not want to come up after a reboot until I manually
>>>>run resource cleanup.
>>>> Setting 'start-failure-is-fatal=false' as a global cluster property
>>>>and a failure-timeout works to resolve the issue, but I don't really
>>>>want the start failure set everywhere.
>>>> While I work on figuring out why the slave resource isn't coming up,
>>>>is it possible to set 'start-failure-is-fatal=false'  only on the
>>>>DRBDSlave resource, or does this need a patch?
>>> No, start-failure-is-fatal is a cluster-wide setting.  But IIUC you
>>> could also set migration-threshold=1 cluster-wide (i.e. in
>>> rsc_defaults), and then override it to either 0 or something higher
>>> just for this resource.  You may find this interesting reading:
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> Project Home:
>>> Getting started:
>>> Bugs:
>>  .~.
>>  /V\
>> //  \\
>>/(   )\
>>Users mailing list: Users at clusterlabs.org
>>Project Home:
>>Getting started:
> ________________________________
> This transmission may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

 //  \\
/(   )\

More information about the Users mailing list