[ClusterLabs] Antw: Re: Continuous master monitor failure of a resource in case some other resource is being promoted

Wed Feb 27 09:48:42 EST 2019

On Wed, 2019-02-27 at 07:58 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 26.02.2019 um
> > > > 16:27 in Nachricht
> 
> <a55b93efe2a218a7a41e090f1a292e9e066ae493.camel at redhat.com>:
> [...]
> > 
> > Actions that have been *scheduled* but not *initiated* can be
> > aborted.
> > But anytime a resource agent has been invoked, we wait for that
> > process
> > to complete.
> 
> I guess it's to receive the regular exit code.
> 
> > 
> 
> [...]
> > 
> > With the current design, the only time pacemaker kills an already
> > running process is if its timeout is reached. Scheduled actions can
> > be
> > cancelled, but not in-flight actions. That makes sense because
> > killing
> > a resource agent in the middle of a start/stop/promote/etc. could
> > leave
> > things in a problematic state that would require recovery.
> 
> Reading that I was wondering about two things:
> 1) To RAs have to be reentrant? I.e. Is it allowed to call a
> "monitor" while a "start" is still processing? AFAIK the docs don't
> say anything about it, and most users assume the calling sequence is
> strictly sequential.

That's never been addressed in the standard, or as far as I know, even
in serious discussion.

Pacemaker does try to ensure that only one operation is in flight at
one time for any given resource. There is one exception, but it
requires explicit configuration: for stonith resource fencing actions,
with concurrent-fencing and pcmk_action_limit.

However that does not prevent multiple resources using the same agent
from having simultaneous actions in flight. So in practice, I think
that's the guideline: an agent should be re-entrant for distinct sets
of parameters, but not necessarily any one set of parameters.

> 2) Given 1), one could add an "asynchronous" "cancel" operation that
> tries to stop any current action with a state "as clean as possible".
> Of course a kill signal handler could try similar, but I guess very
> few RAs do that.

I think signals would have to be used for that. But the complexity
is probably not worth it.

> 
> An ocf-tester that does reentrant testing while producing readable
> logs is another challenge ;-)
> 
> > 
> > > I understand that operations *on the same resource* need
> > > serialization,
> > > but between completely independent resources?
> > 
> > Not within a single transition, but a new transition can't be done
> > (with the current model) until in-flight actions have completed.
> > 
> > Thinking about it some more, it would be easier to get around the
> > problem if we made record-pending permanently true (which is the
> > default in 2.0 but not 1.1). The scheduler could then be sure it
> > knew
> > about all in-flight actions, and calculate a new transition where
> > actions that depend on that one are properly ordered. We'd have to
> > add
> > the concept of waiting for an action that isn't scheduled in the
> > current transition.
> > 
> > This jogged my memory that we already have a BZ for this aspect of
> > the
> > issue:
> > 
> > https://bugs.clusterlabs.org/show_bug.cgi?id=5208 
-- 
Ken Gaillot <kgaillot at redhat.com>