[ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

Mon Jun 6 22:45:49 UTC 2016

Adam Spiers <aspiers at suse.com> wrote:
> Andrew Beekhof <abeekhof at redhat.com> wrote:
> > On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspiers at suse.com> wrote:
> > > Ken Gaillot <kgaillot at redhat.com> wrote:
> > >> My main question is how useful would it actually be in the proposed use
> > >> cases. Considering the possibility that the expected start might never
> > >> happen (or fail), can an RA really do anything different if
> > >> start_expected=true?
> > >
> > > That's the wrong question :-)
> > >
> > >> If the use case is there, I have no problem with
> > >> adding it, but I want to make sure it's worthwhile.
> > >
> > > The use case which started this whole thread is for
> > > start_expected=false, not start_expected=true.
> > 
> > Isn't this just two sides of the same coin?
> > If you're not doing the same thing for both cases, then you're just
> > reversing the order of the clauses.
> 
> No, because the stated concern about unreliable expectations
> ("Considering the possibility that the expected start might never
> happen (or fail)") was regarding start_expected=true, and that's the
> side of the coin we don't care about, so it doesn't matter if it's
> unreliable.

BTW, if the expected start happens but fails, then Pacemaker will just
keep repeating until migration-threshold is hit, at which point it
will call the RA 'stop' action finally with start_expected=false.
So that's of no concern.

Maybe your point was that if the expected start never happens (so
never even gets a chance to fail), we still want to do a nova
service-disable?

Yes that would be nice, but this proposal was never intended to
address that.  I guess we'd need an entirely different mechanism in
Pacemaker for that.  But let's not allow perfection to become the
enemy of the good ;-)