[ClusterLabs Developers] OCF actions support

Wed Feb 12 18:09:00 EST 2020

On Wed, 12 Feb 2020 15:33:08 -0600
Ken Gaillot <kgaillot at redhat.com> wrote:

> On Wed, 2020-02-12 at 16:17 +0100, Jehan-Guillaume de Rorthais wrote:
> > Last mail for today :)
> > 
> > At least three actions exists in the OCF specs and are not fully
> > supported in
> > Pacemaker.
> > 
> > 1. recovery
> > 
> >   Today, Pacemaker replace this action with a stop/start or
> >   demote/stop/start/promote transition. I bet some RA (at least PAF)
> > would be
> >   able to deal with a recovery themselves faster in one action than
> > with 2 or 4
> >   actions (without counting notify).  
> 
> In principle it should be possible to support "recover" when the stop
> and start are scheduled on the same node. It would be similar to how
> pacemaker currently changes stop+start to live migration only when
> certain conditions are met.

Sounds good!
In fact, that's how we detect recovery during notify actions in PAF, if the
resource is stopped and started in the same transition.

> One question would be how to handle "recover" failures. My first
> instinct is that if recover fails, the cluster should switch to
> stop+start, similar to a failed live migration. An alternative would be
> to retry the recover action up to the migration-threshold then switch
> to stop+start.

If live migration already behave like that, then the first instinct seems more
coherent. But I have no strong opinion.

> > 2. migration-to and migration-from
> > 
> >   These two actions are only available for non-clone resource today.
> > 
> >   I would really appreciate having them for multi-state resources.
> > Think
> >   switchover roles between primary and secondaries.  
> 
> I don't follow how using that to switch roles would be different from
> demote/promote with notifications.

When switching over roles between a primary and a secondary, there might have
some additional steps the resource need to handle.

Today, PAF handle this during notify actions. But:

1. we need to detect switchover by ourselves
2. as you know, notify action return code is ignored. Should the switchover
fail, we have to set a flag so the next action fails.

This makes a lot of code not really welcomed in a RA :)

> > How hard would it be to add these actions? Is it something that could  
> 
> Most changes in pacemaker are big projects; these certainly would be.
> Anything that touches the scheduler tends to involve a lot of work.
> There are 35K lines of scheduler-related code and it's difficult to
> predict how a change in one part affects another.

OK.

Thankfully, there's regression test to at least test known behaviors.