[ClusterLabs Developers] OCF actions support
Jehan-Guillaume de Rorthais
jgdr at dalibo.com
Wed Feb 12 23:09:00 UTC 2020
On Wed, 12 Feb 2020 15:33:08 -0600
Ken Gaillot <kgaillot at redhat.com> wrote:
> On Wed, 2020-02-12 at 16:17 +0100, Jehan-Guillaume de Rorthais wrote:
> > Last mail for today :)
> >
> > At least three actions exists in the OCF specs and are not fully
> > supported in
> > Pacemaker.
> >
> > 1. recovery
> >
> > Today, Pacemaker replace this action with a stop/start or
> > demote/stop/start/promote transition. I bet some RA (at least PAF)
> > would be
> > able to deal with a recovery themselves faster in one action than
> > with 2 or 4
> > actions (without counting notify).
>
> In principle it should be possible to support "recover" when the stop
> and start are scheduled on the same node. It would be similar to how
> pacemaker currently changes stop+start to live migration only when
> certain conditions are met.
Sounds good!
In fact, that's how we detect recovery during notify actions in PAF, if the
resource is stopped and started in the same transition.
> One question would be how to handle "recover" failures. My first
> instinct is that if recover fails, the cluster should switch to
> stop+start, similar to a failed live migration. An alternative would be
> to retry the recover action up to the migration-threshold then switch
> to stop+start.
If live migration already behave like that, then the first instinct seems more
coherent. But I have no strong opinion.
> > 2. migration-to and migration-from
> >
> > These two actions are only available for non-clone resource today.
> >
> > I would really appreciate having them for multi-state resources.
> > Think
> > switchover roles between primary and secondaries.
>
> I don't follow how using that to switch roles would be different from
> demote/promote with notifications.
When switching over roles between a primary and a secondary, there might have
some additional steps the resource need to handle.
Today, PAF handle this during notify actions. But:
1. we need to detect switchover by ourselves
2. as you know, notify action return code is ignored. Should the switchover
fail, we have to set a flag so the next action fails.
This makes a lot of code not really welcomed in a RA :)
> > How hard would it be to add these actions? Is it something that could
>
> Most changes in pacemaker are big projects; these certainly would be.
> Anything that touches the scheduler tends to involve a lot of work.
> There are 35K lines of scheduler-related code and it's difficult to
> predict how a change in one part affects another.
OK.
Thankfully, there's regression test to at least test known behaviors.
More information about the Developers
mailing list