[ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

Ken Gaillot kgaillot at redhat.com
Fri Feb 18 10:47:33 EST 2022


On Fri, 2022-02-18 at 16:00 +0100, Lentes, Bernd wrote:
> 
> ----- On Feb 17, 2022, at 4:25 PM, kgaillot kgaillot at redhat.com
> wrote:
> > > So for me the big question is:
> > > When a transition is happening, and there is a change in the
> > > cluster,
> > > is the transition "aborted"
> > > (delayed or interrupted would be better) or not ?
> > > Is this behaviour consistent ? If no, from what does it depend ?
> > > 
> > > Bernd
> > 
> > Yes, anytime the DC sees a change that could affect resources, it
> > will
> > abort the current transition and calculate a new one. Aborting
> > means
> > not initiating any new actions from the transition -- but any
> > actions
> > currently in flight must complete before the new transition can be
> > calculated.
> > 
> > Changes that abort a transition include configuration changes, a
> > node
> > joining or leaving, an unexpected action result being received, a
> > node
> > attribute changing, the cluster-recheck-interval passing since the
> > last
> > transition, or a timer popping for a time-based event (failure
> > timeout,
> > rule, etc.). I may be forgetting some, but you get the idea.
> > --
> 
> Hi Ken,
> 
> thanks for your explanation. 
> Now i try to resume if i understood everything correctly:
> I started the shutdown of several VirtualDomains with "crm resource
> vm_xxxxxxx stop".
> Not concurrently, one by one with some delay of about 30 sec.
> But there was already one VirtualDomain shutting down before.
> Cluster said this transition is aborted, but in real it couldn't be
> aborted. How to abort a running shutdown ?

The "transition" (i.e. a plan of actions to take) is aborted, not
running actions. The wording is awkward and I hope to find the time to
change it at some point (references are scattered throughout the code,
and we have to think about people who may have scripts that parse logs
or whatnot).

There's no way from within the cluster to abort a running action.
However kill -9 on the agent works :) (the cluster will consider the
action failed)

> So we had to wait for the shutdown of that domain.
> It has been switched off by libvirt with "virsh destroy" after 10
> minutes.
> After that the shutdown of the other domains was initiated, and the
> domains shutdown cleanly.
> 
> So, to conclude:
> I forgot that i had already one domain in shutdown. I should have
> waited for this to finish before starting the stop of the other
> resources.
> Cluster tried to "abort" the shutdown, but shutdown can't be aborted.
> And i had bad luck that the shutdown of this domain took so long.
> 
> Correct ?
> 
> Bernd
> 

Yes, other than the cluster isn't trying to abort the shutdown, it's
just discarding any actions that were planned after it in the same
transition.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list