[ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

Wed Feb 16 15:44:48 EST 2022

On Wed, 2022-02-16 at 21:47 +0300, Andrei Borzenkov wrote:
> On 16.02.2022 20:48, Andrei Borzenkov wrote:
> > I guess the real question here is why "Transition aborted" is
> > logged although
> > transition apparently continues. Transition 128 started at 20:54:30
> > and completed
> > at 21:04:26, but there were multiple "Transition 128 aborted"
> > messages in between
> > (unfortunately one needs now to hunt for another mail to put them
> > together).
> > 
> > It looks like "Transition aborted" is more "we try to abort this
> > transition if
> > possible". My guess is that pacemaker must wait for currently
> > running action(s)
> > which can take quite some time when stopping virtual domain.
> > Transition 128
> > was initiated when stopping vm_pathway, but we have no idea when it
> > was stopped.
> > 
> 
> Yes, when code logs "Transition aborted", nothing is really aborted.
> It just tells
> pacemaker to not start any further actions which are part of this
> transition. But
> for all I can tell it does not affect currently running action.

Exactly, any actions already initiated must finish before the next
transition can be calculated, because their results can affect what
needs to be done.

We don't kill actions in flight because it's perfectly reasonable for
actions to be split across multiple transitions. Often when some event
is happening, lots of micro-conditions (action results, node attribute
changes, etc.) change in a short time, and you'll see a new transition
after each one.
-- 
Ken Gaillot <kgaillot at redhat.com>