[ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

Adam Spiers aspiers at suse.com
Mon Jun 6 18:39:37 EDT 2016


Andrew Beekhof <abeekhof at redhat.com> wrote:
> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspiers at suse.com> wrote:
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
> >> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
> >> >> A recent thread discussed a proposed new feature, a new environment
> >> >> variable that would be passed to resource agents, indicating whether a
> >> >> stop action was part of a recovery.
> >> >>
> >> >> Since that thread was long and covered a lot of topics, I'm starting a
> >> >> new one to focus on the core issue remaining:
> >> >>
> >> >> The original idea was to pass the number of restarts remaining before
> >> >> the resource will no longer tried to be started on the same node. This
> >> >> involves calculating (fail-count - migration-threshold), and that
> >> >> implies certain limitations: (1) it will only be set when the cluster
> >> >> checks migration-threshold; (2) it will only be set for the failed
> >> >> resource itself, not for other resources that may be recovered due to
> >> >> dependencies on it.
> >> >>
> >> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> >> >> forgot to cc the list on my reply, so I'll summarize now: We would set a
> >> >> new variable like OCF_RESKEY_CRM_recovery=true
> >> >
> >> > This concept worries me, especially when what we've implemented is
> >> > called OCF_RESKEY_CRM_restarting.
> >>
> >> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
> >
> > [snipped]
> >
> >> My main question is how useful would it actually be in the proposed use
> >> cases. Considering the possibility that the expected start might never
> >> happen (or fail), can an RA really do anything different if
> >> start_expected=true?
> >
> > That's the wrong question :-)
> >
> >> If the use case is there, I have no problem with
> >> adding it, but I want to make sure it's worthwhile.
> >
> > The use case which started this whole thread is for
> > start_expected=false, not start_expected=true.
> 
> Isn't this just two sides of the same coin?
> If you're not doing the same thing for both cases, then you're just
> reversing the order of the clauses.

No, because the stated concern about unreliable expectations
("Considering the possibility that the expected start might never
happen (or fail)") was regarding start_expected=true, and that's the
side of the coin we don't care about, so it doesn't matter if it's
unreliable.




More information about the Users mailing list