[ClusterLabs] VirtualDomain - monitor misses to report & plays up
Andrei Borzenkov
arvidjaar at gmail.com
Mon Apr 12 01:17:49 EDT 2021
On 11.04.2021 21:38, lejeczek wrote:
> Hi guys.
>
> I've experiencing weir "handling" of VirtualDomain by the cluster. It
> seems that cluster sometimes fails to report real state of VM which
> results sometime in troubles - like when cluster thinks VM is not
> running, which is running then cluster starts it on another node which
> fcuks up qcow image.
> Right now for example I'm looking at cluster report VM is up & okey
> while it is not, on none of the nodes (because VM was 'poweroff' from
> itself)
> So I:
>
> -> $ pcs resource refresh c8kubermaster1
> Cleaned up c8kubermaster1 on swir
> Cleaned up c8kubermaster1 on dzien
> Waiting for 2 replies from the controller
> ... got reply
> ... got reply (done)
>
> In logs where VM is supposed to be running, according to cluster
> ..
> notice: Requesting local execution of probe operation for
> c8kubermaster1 on swir
> notice: Result of probe operation for c8kubermaster1 on swir: ok
> notice: Requesting local execution of monitor operation for
> c8kubermaster1 on swir
> notice: Result of monitor operation for c8kubermaster1 on swir: ok
>
> , on the second node (2-node cluster) in logs:
> ..
> notice: State transition S_IDLE -> S_POLICY_ENGINE
> notice: Ignoring expired c8kubernode1_migrate_to_0 failure on dzien
> notice: * Start c8kubermaster1 ( swir )
> notice: Calculated transition 42, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-2655.bz2
> notice: Initiating monitor operation c8kubermaster1_monitor_0 on swir
> notice: Initiating monitor operation c8kubermaster1_monitor_0 locally
> on dzien
> notice: Requesting local execution of probe operation for
> c8kubermaster1 on dzien
> notice: Result of probe operation for c8kubermaster1 on dzien: not running
> notice: Transition 42 aborted by operation c8kubermaster1_monitor_0
> 'modify' on swir: Event failed
> notice: Transition 42 action 11 (c8kubermaster1_monitor_0 on swir):
> expected 'not running' but got 'ok'
>
You need to debug whether virsh returns correct information which is
misinterpreted by agent/pacemaker or virsh returns incorrect
information. As far as I can tell, all that VirtualDomain monitor option
does is running "virsh domstate $DOMAIN".
> -> $ pcs resource config c8kubermaster1
> Resource: c8kubermaster1 (class=ocf provider=heartbeat type=VirtualDomain)
> Attributes: config=/var/lib/pacemaker/conf.d/c8kubermaster1.xml
> hypervisor=qemu:///system migration_transport=ssh
> Meta Attrs: allow-migrate=true failure-timeout=120s
> Operations: migrate_from interval=0s timeout=180s
> (c8kubermaster1-migrate_from-interval-0s)
> migrate_to interval=0s timeout=180s
> (c8kubermaster1-migrate_to-interval-0s)
> monitor interval=30s (c8kubermaster1-monitor-interval-30s)
> start interval=0s timeout=90s
> (c8kubermaster1-start-interval-0s)
> stop interval=0s timeout=90s
> (c8kubermaster1-stop-interval-0s)
>
> Disable + enable the resource 'fixes' the glitch but, naturally the
> obvious question would be - why that is even allowed to happen?
> many thanks, L.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list