[ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?
aspiers at suse.com
Mon Jun 6 18:29:41 EDT 2016
Ken Gaillot <kgaillot at redhat.com> wrote:
> On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
> > On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
> >> A recent thread discussed a proposed new feature, a new environment
> >> variable that would be passed to resource agents, indicating whether a
> >> stop action was part of a recovery.
> >> Since that thread was long and covered a lot of topics, I'm starting a
> >> new one to focus on the core issue remaining:
> >> The original idea was to pass the number of restarts remaining before
> >> the resource will no longer tried to be started on the same node. This
> >> involves calculating (fail-count - migration-threshold), and that
> >> implies certain limitations: (1) it will only be set when the cluster
> >> checks migration-threshold; (2) it will only be set for the failed
> >> resource itself, not for other resources that may be recovered due to
> >> dependencies on it.
> >> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> >> forgot to cc the list on my reply, so I'll summarize now: We would set a
> >> new variable like OCF_RESKEY_CRM_recovery=true
> > This concept worries me, especially when what we've implemented is
> > called OCF_RESKEY_CRM_restarting.
> Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.
> My main question is how useful would it actually be in the proposed use
> cases. Considering the possibility that the expected start might never
> happen (or fail), can an RA really do anything different if
That's the wrong question :-)
> If the use case is there, I have no problem with
> adding it, but I want to make sure it's worthwhile.
The use case which started this whole thread is for
start_expected=false, not start_expected=true. When it's false for
NovaCompute, we call nova service-disable to ensure that nova doesn't
attempt to schedule any more VMs on that host.
If start_expected=true, we don't *want* to do anything different. So
it doesn't matter even if the expected start never happens.
More information about the Users