[ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

Ken Gaillot kgaillot at redhat.com
Mon Jun 6 19:07:24 EDT 2016


On 06/06/2016 05:45 PM, Adam Spiers wrote:
> Adam Spiers <aspiers at suse.com> wrote:
>> Andrew Beekhof <abeekhof at redhat.com> wrote:
>>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspiers at suse.com> wrote:
>>>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>>>> My main question is how useful would it actually be in the proposed use
>>>>> cases. Considering the possibility that the expected start might never
>>>>> happen (or fail), can an RA really do anything different if
>>>>> start_expected=true?
>>>>
>>>> That's the wrong question :-)
>>>>
>>>>> If the use case is there, I have no problem with
>>>>> adding it, but I want to make sure it's worthwhile.
>>>>
>>>> The use case which started this whole thread is for
>>>> start_expected=false, not start_expected=true.
>>>
>>> Isn't this just two sides of the same coin?
>>> If you're not doing the same thing for both cases, then you're just
>>> reversing the order of the clauses.
>>
>> No, because the stated concern about unreliable expectations
>> ("Considering the possibility that the expected start might never
>> happen (or fail)") was regarding start_expected=true, and that's the
>> side of the coin we don't care about, so it doesn't matter if it's
>> unreliable.
> 
> BTW, if the expected start happens but fails, then Pacemaker will just
> keep repeating until migration-threshold is hit, at which point it
> will call the RA 'stop' action finally with start_expected=false.
> So that's of no concern.

To clarify, that's configurable, via start-failure-is-fatal and on-fail

> Maybe your point was that if the expected start never happens (so
> never even gets a chance to fail), we still want to do a nova
> service-disable?

That is a good question, which might mean it should be done on every
stop -- or could that cause problems (besides delays)?

Another aspect of this is that the proposed feature could only look at a
single transition. What if stop is called with start_expected=false, but
then Pacemaker is able to start the service on the same node in the next
transition immediately afterward? Would having called service-disable
cause problems for that start?

> Yes that would be nice, but this proposal was never intended to
> address that.  I guess we'd need an entirely different mechanism in
> Pacemaker for that.  But let's not allow perfection to become the
> enemy of the good ;-)

The ultimate concern is that this will encourage people to write RAs
that leave services in a dangerous state after stop is called.

I think with naming and documenting it properly, I'm fine to provide the
option, but I'm on the fence. Beekhof needs a little more convincing :-)




More information about the Users mailing list