[ClusterLabs] Restarting a failed resource on same node

Wed Oct 4 17:01:43 EDT 2017

On Wed, 2017-10-04 at 10:59 -0700, Paolo Zarpellon wrote:
> Hi Ken,
> Indeed the migration-threshold was the problem :-(
> 
> BTW, for a master-slave resource, is it possible to have different
> migration-thresholds?
> I.e. I'd like the slave to be restarted where it failed, but master
> to be migrated to the
> other node right away (by promoting the slave there).

No, that's not possible currently. There's a planned overhaul of the
failure handling options that would open the possibility, though. No
time frame on when it might get done.

> I've tried configuring something like this:
> 
> [root at test-236 ~]# pcs resource show test-ha
>  Master: test-ha
>   Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
> clone-node-max=1 requires=nothing migration-threshold=1 
>   Resource: test (class=ocf provider=heartbeat type=test)
>    Meta Attrs: migration-threshold=INFINITY 
>    Operations: start interval=0s on-fail=restart timeout=120s (test-
> start-interval-0s)
>                monitor interval=10s on-fail=restart timeout=60s
> (test-monitor-interval-10s)
>                monitor interval=11s on-fail=restart role=Master
> timeout=60s (test-monitor-interval-11s)
>                promote interval=0s on-fail=restart timeout=60s (test-
> promote-interval-0s)
>                demote interval=0s on-fail=stop timeout=60s (test-
> demote-interval-0s)
>                stop interval=0s on-fail=block timeout=60s (test-stop-
> interval-0s)
>                notify interval=0s timeout=60s (test-notify-interval-
> 0s)
> [root at test-236 ~]#
> 
> but It does not seem to help as both master and slave are always
> restarted on the same node
> due to test resource's migration-threshold set to INFINITY
> 
> Thank you in advance.
> Regards,
> Paolo
> 
> On Tue, Oct 3, 2017 at 7:12 AM, Ken Gaillot <kgaillot at redhat.com>
> wrote:
> > On Mon, 2017-10-02 at 12:32 -0700, Paolo Zarpellon wrote:
> > > Hi,
> > > on a basic 2-node cluster, I have a master-slave resource where
> > > master runs on a node and slave on the other one. If I kill the
> > slave
> > > resource, the resource status goes to "stopped".
> > > Similarly, if I kill the the master resource, the slave one is
> > > promoted to master but the failed one does not restart as slave.
> > > Is there a way to restart failing resources on the same node they
> > > were running?
> > > Thank you in advance.
> > > Regards,
> > > Paolo
> > 
> > Restarting on the same node is the default behavior -- something
> > must
> > be blocking it. For example, check your migration-threshold (if
> > restarting fails this many times, it has nowhere to go and will
> > stop).
> > 
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
> > h.pdf
> > Bugs: http://bugs.clusterlabs.org
> > 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org