[ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

Lentes, Bernd bernd.lentes at helmholtz-muenchen.de
Thu Feb 17 08:05:25 EST 2022

----- On Feb 16, 2022, at 6:48 PM, arvidjaar arvidjaar at gmail.com wrote:
> Splitting logs between different messages does not really help in interpreting
> them.

I agree.
Here is the complete excerpt from the respective time:

> I guess the real question here is why "Transition aborted" is logged although
> transition apparently continues. Transition 128 started at 20:54:30 and
> completed
> at 21:04:26, but there were multiple "Transition 128 aborted" messages in
> between

That's correct. The shutdown_timeout for the domain is set with 600 sec. in the CIB.
The RA says:
# The "shutdown_timeout" we use here is the operation
# timeout specified in the CIB, minus 5 seconds
And between 20:54:30 and 21:04:26 we have very close 595 sec.

> It looks like "Transition aborted" is more "we try to abort this transition if
> possible". My guess is that pacemaker must wait for currently running action(s)
> which can take quite some time when stopping virtual domain. Transition 128
> was initiated when stopping vm_pathway, but we have no idea when it was stopped.

We have:
Feb 15 21:04:26 [15370] ha-idg-2       crmd:   notice: run_graph:       Transition 128 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-3548.bz2): Complete

and the log from libvirt confirms it:
2022-02-15T20:04:26.569471Z qemu-system-x86_64: terminating on signal 15 from pid 7368 (/usr/sbin/libvirtd)
2022-02-15 20:04:26.769+0000: shutting down, reason=destroyed

Time in libvirt logs is UTC, and in Munich we have currently UTC+1, so the time differs in the logs.
We see that the domain is "switched off" via libvirt exactly at 21:04:26.

So for me the big question is:
When a transition is happening, and there is a change in the cluster, is the transition "aborted"
(delayed or interrupted would be better) or not ?
Is this behaviour consistent ? If no, from what does it depend ?


