[ClusterLabs] FR: send failcount to OCF RA start/stop actions
Ken Gaillot
kgaillot at redhat.com
Mon May 9 22:40:19 UTC 2016
On 05/04/2016 11:47 AM, Adam Spiers wrote:
> Ken Gaillot <kgaillot at redhat.com> wrote:
>> On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
>>> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>>>> Hi all,
>>>>
>>>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>>>> would like Pacemaker to be extended to export the current failcount as
>>>> an environment variable to OCF RA scripts when they are invoked with
>>>> 'start' or 'stop' actions. This would mean that if you have
>>>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>>>> would be able to implement a different behaviour for the third and
>>>> final 'stop' of a service executed on a node, which is different to
>>>> the previous 'stop' actions executed just prior to attempting a
>>>> restart of the service. (In the non-clone case, this would happen
>>>> just before migrating the service to another node.)
>>> So what you actually want to know is how much headroom
>>> there still is till the resource would be migrated.
>>> So wouldn't it then be much more catchy if we don't pass
>>> the failcount but rather the headroom?
>>
>> Yes, that's the plan: pass a new environment variable with
>> (migration-threshold - fail-count) when recovering a resource. I haven't
>> worked out the exact behavior yet, but that's the idea. I do hope to get
>> this in 1.1.15 since it's a small change.
>>
>> The advantage over using crm_failcount is that it will be limited to the
>> current recovery attempt, and it will calculate the headroom as you say,
>> rather than the raw failcount.
>
> Headroom sounds more usable, but if it's not significant extra work,
> why not pass both? It could come in handy, even if only for more
> informative logging from the RA.
>
> Thanks a lot!
Here is what I'm testing currently:
- When the cluster recovers a resource, the resource agent's stop action
will get a new variable, OCF_RESKEY_CRM_meta_recovery_left =
migration-threshold - fail-count on the local node.
- The variable is not added for any action other than stop.
- I'm preferring simplicity over flexibility by providing only a single
variable. The RA theoretically can already get the migration-threshold
from the CIB and fail-count from attrd -- what we're adding is the
knowledge that the stop is part of a recovery.
- If the stop is final (the cluster does not plan to start the resource
anywhere), the variable may be set to 0, or unset. The RA should treat 0
and unset as equivalent.
- So, the variable will be 1 for the stop before the last time the
cluster will try to start the resource on the same node, and 0 or unset
for the last stop on this node before trying to start on another node.
- The variable will be set only in situations when the cluster will
consider migration-threshold. This makes sense, but some situations may
be unintuitive:
-- If a resource is being recovered, but the fail-count is being cleared
in the same transition, the cluster will ignore migration-threshold (and
the variable will not be set). The RA might see recovery_left=5, 4, 3,
then someone clears the fail-count, and it won't see recovery_left even
though there is a stop and start being attempted.
-- Migration-threshold will be considered (and the variable will be set)
only if the resource is being recovered due to failure, not if the
resource is being restarted or moved for some other reason (constraints,
node standby, etc.).
-- The previous point is true even if the resource is restarting/moving
because it is part of a group with another member being recovered due to
failure. Only the failed resource will get the variable. I can see this
might be problematic for interested RAs, because the resource may be
restarted several times on the local node then forced away, without the
variable ever being present -- but the resource will be forced away
because it is part of a group that is moving, not because it is being
recovered (its own fail-count stays 0).
Let me know if you see any problems or have any suggestions.
More information about the Users
mailing list