[ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

Lentes, Bernd bernd.lentes at helmholtz-muenchen.de
Thu Feb 17 08:05:25 EST 2022


----- On Feb 16, 2022, at 6:48 PM, arvidjaar arvidjaar at gmail.com wrote:
> 
> 
> Splitting logs between different messages does not really help in interpreting
> them.

I agree.
Here is the complete excerpt from the respective time:
https://nc-mcd.helmholtz-muenchen.de/nextcloud/s/eY8SA8pe4HZBBc8

> 
> I guess the real question here is why "Transition aborted" is logged although
> transition apparently continues. Transition 128 started at 20:54:30 and
> completed
> at 21:04:26, but there were multiple "Transition 128 aborted" messages in
> between

That's correct. The shutdown_timeout for the domain is set with 600 sec. in the CIB.
The RA says:
# The "shutdown_timeout" we use here is the operation
# timeout specified in the CIB, minus 5 seconds
And between 20:54:30 and 21:04:26 we have very close 595 sec.

> It looks like "Transition aborted" is more "we try to abort this transition if
> possible". My guess is that pacemaker must wait for currently running action(s)
> which can take quite some time when stopping virtual domain. Transition 128
> was initiated when stopping vm_pathway, but we have no idea when it was stopped.

We have:
Feb 15 21:04:26 [15370] ha-idg-2       crmd:   notice: run_graph:       Transition 128 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-3548.bz2): Complete

and the log from libvirt confirms it:
/var/log/libvirtd/qemu/vm_pathway.log:
2022-02-15T20:04:26.569471Z qemu-system-x86_64: terminating on signal 15 from pid 7368 (/usr/sbin/libvirtd)
2022-02-15 20:04:26.769+0000: shutting down, reason=destroyed

Time in libvirt logs is UTC, and in Munich we have currently UTC+1, so the time differs in the logs.
We see that the domain is "switched off" via libvirt exactly at 21:04:26.

So for me the big question is:
When a transition is happening, and there is a change in the cluster, is the transition "aborted"
(delayed or interrupted would be better) or not ?
Is this behaviour consistent ? If no, from what does it depend ?

Bernd


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2217 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220217/5137ddfa/attachment.p7s>


More information about the Users mailing list