[ClusterLabs] Antw: Re: FR: send failcount to OCF RA start/stop actions

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue May 10 03:29:04 EDT 2016


>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 10.05.2016 um 00:40 in Nachricht
<573111D3.7060102 at redhat.com>:
> On 05/04/2016 11:47 AM, Adam Spiers wrote:
>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
>>>> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>>>>> Hi all,
>>>>>
>>>>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>>>>> would like Pacemaker to be extended to export the current failcount as
>>>>> an environment variable to OCF RA scripts when they are invoked with
>>>>> 'start' or 'stop' actions.  This would mean that if you have
>>>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>>>>> would be able to implement a different behaviour for the third and
>>>>> final 'stop' of a service executed on a node, which is different to
>>>>> the previous 'stop' actions executed just prior to attempting a
>>>>> restart of the service.  (In the non-clone case, this would happen
>>>>> just before migrating the service to another node.)
>>>> So what you actually want to know is how much headroom
>>>> there still is till the resource would be migrated.
>>>> So wouldn't it then be much more catchy if we don't pass
>>>> the failcount but rather the headroom?
>>>
>>> Yes, that's the plan: pass a new environment variable with
>>> (migration-threshold - fail-count) when recovering a resource. I haven't
>>> worked out the exact behavior yet, but that's the idea. I do hope to get
>>> this in 1.1.15 since it's a small change.
>>>
>>> The advantage over using crm_failcount is that it will be limited to the
>>> current recovery attempt, and it will calculate the headroom as you say,
>>> rather than the raw failcount.
>> 
>> Headroom sounds more usable, but if it's not significant extra work,
>> why not pass both?  It could come in handy, even if only for more
>> informative logging from the RA.
>> 
>> Thanks a lot!
> 
> Here is what I'm testing currently:
> 
> - When the cluster recovers a resource, the resource agent's stop action
> will get a new variable, OCF_RESKEY_CRM_meta_recovery_left =
> migration-threshold - fail-count on the local node.

With that mechanism RA testingwill be more complicated as it is now, and I cannot see the benefit yet.

> 
> - The variable is not added for any action other than stop.
> 
> - I'm preferring simplicity over flexibility by providing only a single
> variable. The RA theoretically can already get the migration-threshold
> from the CIB and fail-count from attrd -- what we're adding is the
> knowledge that the stop is part of a recovery.
> 
> - If the stop is final (the cluster does not plan to start the resource
> anywhere), the variable may be set to 0, or unset. The RA should treat 0
> and unset as equivalent.
> 
> - So, the variable will be 1 for the stop before the last time the
> cluster will try to start the resource on the same node, and 0 or unset
> for the last stop on this node before trying to start on another node.

Be aware that the node could be fenced (for reasons ouside of your RA) even before all these attempts are carried out.

> 
> - The variable will be set only in situations when the cluster will
> consider migration-threshold. This makes sense, but some situations may
> be unintuitive:
> 
> -- If a resource is being recovered, but the fail-count is being cleared
> in the same transition, the cluster will ignore migration-threshold (and
> the variable will not be set). The RA might see recovery_left=5, 4, 3,
> then someone clears the fail-count, and it won't see recovery_left even
> though there is a stop and start being attempted.
> 
> -- Migration-threshold will be considered (and the variable will be set)
> only if the resource is being recovered due to failure, not if the
> resource is being restarted or moved for some other reason (constraints,
> node standby, etc.).
> 
> -- The previous point is true even if the resource is restarting/moving
> because it is part of a group with another member being recovered due to
> failure. Only the failed resource will get the variable. I can see this
> might be problematic for interested RAs, because the resource may be
> restarted several times on the local node then forced away, without the
> variable ever being present -- but the resource will be forced away
> because it is part of a group that is moving, not because it is being
> recovered (its own fail-count stays 0).
> 
> Let me know if you see any problems or have any suggestions.

Can you summarize in one sentence what problem your proposal will solve?

Regards,
Ulrich





More information about the Users mailing list