[ClusterLabs] FR: send failcount to OCF RA start/stop actions

Wed May 4 12:47:03 EDT 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> On 05/04/2016 08:49 AM, Klaus Wenninger wrote:
> > On 05/04/2016 02:09 PM, Adam Spiers wrote:
> >> Hi all,
> >>
> >> As discussed with Ken and Andrew at the OpenStack summit last week, we
> >> would like Pacemaker to be extended to export the current failcount as
> >> an environment variable to OCF RA scripts when they are invoked with
> >> 'start' or 'stop' actions.  This would mean that if you have
> >> start-failure-is-fatal=false and migration-threshold=3 (say), then you
> >> would be able to implement a different behaviour for the third and
> >> final 'stop' of a service executed on a node, which is different to
> >> the previous 'stop' actions executed just prior to attempting a
> >> restart of the service.  (In the non-clone case, this would happen
> >> just before migrating the service to another node.)
> > So what you actually want to know is how much headroom
> > there still is till the resource would be migrated.
> > So wouldn't it then be much more catchy if we don't pass
> > the failcount but rather the headroom?
> 
> Yes, that's the plan: pass a new environment variable with
> (migration-threshold - fail-count) when recovering a resource. I haven't
> worked out the exact behavior yet, but that's the idea. I do hope to get
> this in 1.1.15 since it's a small change.
> 
> The advantage over using crm_failcount is that it will be limited to the
> current recovery attempt, and it will calculate the headroom as you say,
> rather than the raw failcount.

Headroom sounds more usable, but if it's not significant extra work,
why not pass both?  It could come in handy, even if only for more
informative logging from the RA.

Thanks a lot!