[ClusterLabs] FR: send failcount to OCF RA start/stop actions

Wed May 4 10:12:15 EDT 2016

On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
> On 05/04/2016 02:09 PM, Adam Spiers wrote:
>> Hi all,
>>
>> As discussed with Ken and Andrew at the OpenStack summit last week, we
>> would like Pacemaker to be extended to export the current failcount as
>> an environment variable to OCF RA scripts when they are invoked with
>> 'start' or 'stop' actions.  This would mean that if you have
>> start-failure-is-fatal=false and migration-threshold=3 (say), then you
>> would be able to implement a different behaviour for the third and
>> final 'stop' of a service executed on a node, which is different to
>> the previous 'stop' actions executed just prior to attempting a
>> restart of the service.  (In the non-clone case, this would happen
>> just before migrating the service to another node.)
> So what you actually want to know is how much headroom
> there still is till the resource would be migrated.
> So wouldn't it then be much more catchy if we don't pass
> the failcount but rather the headroom?

Yes, that's the plan: pass a new environment variable with
(migration-threshold - fail-count) when recovering a resource. I haven't
worked out the exact behavior yet, but that's the idea. I do hope to get
this in 1.1.15 since it's a small change.

The advantage over using crm_failcount is that it will be limited to the
current recovery attempt, and it will calculate the headroom as you say,
rather than the raw failcount.

>> One use case for this is to invoke "nova service-disable" if Pacemaker
>> fails to restart the nova-compute service on an OpenStack compute
>> node.
>>
>> Is it feasible to squeeze this in before the 1.1.15 release?
>>
>> Thanks a lot!
>> Adam