[ClusterLabs] Antw: Re: Antw: [EXT] VirtualDomain ‑ started but... not really

Tue Dec 14 10:05:04 EST 2021

>>> lejeczek via Users <users at clusterlabs.org> schrieb am 14.12.2021 um 15:48
in
Nachricht <653e9087-e648-9b8d-01de-341d2350f825 at yahoo.co.uk>:

>> Hi!
>>
>> My guess is that you checked the corresponding logs already; why not show 
> them here?
>> I can imagine that the VMs die rather early after start.
>>
>> Regards,
>> Ulrich
>>
>>>>> lejeczek via Users <users at clusterlabs.org> schrieb am 10.12.2021 um
17:33 in
>> Nachricht <df8eac8f‑a58e‑28e5‑53b5‑73eb1fe432b2 at yahoo.co.uk>:
>>> Hi guys.
>>>
>>> I quite often.. well, to frequently in my mind, see a VM
>>> which cluster says:
>>> ‑> $ pcs resource status | grep ‑v disabled
>>> ...
>>>     * c8kubermaster2    (ocf::heartbeat:VirtualDomain):
>>>    Started dzien
>>> ..
>>>
>>> but that is false, also cluster itself confirms it:
>>> ‑> $ pcs resource debug‑monitor c8kubermaster2
>>> crm_resource: Error performing operation: Not running
>>> Operation force‑check for c8kubermaster2
>>> (ocf:heartbeat:VirtualDomain) returned: 'not running' (7)
>>>
>>> What is the issue here, might be & how best to troubleshoot it?
>>>
>>> ‑> $ pcs resource config c8kubermaster2
>>>    Resource: c8kubermaster2 (class=ocf provider=heartbeat
>>> type=VirtualDomain)
>>>     Attributes:
>>> config=/var/lib/pacemaker/conf.d/c8kubermaster2.xml
>>> hypervisor=qemu:///system migration_transport=ssh
>>>     Meta Attrs: allow‑migrate=true failure‑timeout=30s
>>>     Operations: migrate_from interval=0s timeout=180s
>>> (c8kubermaster2‑migrate_from‑interval‑0s)
>>>                 migrate_to interval=0s timeout=180s
>>> (c8kubermaster2‑migrate_to‑interval‑0s)
>>>                 monitor interval=30s
>>> (c8kubermaster2‑monitor‑interval‑30s)
>>>                 start interval=0s timeout=90s
>>> (c8kubermaster2‑start‑interval‑0s)
>>>                 stop interval=0s timeout=90s
>>> (c8kubermaster2‑stop‑interval‑0s)
>>>
>>> many thanks, L.
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/ 
> Not much there in the logs I could see (which is probably 
> why cluster decides the resource is okey)
> What is the resource's monitor for it not for that exactly ‑ 
> to check the state of resource ‑ whether it dies early or 
> late should not matter.
> What suffices in order to "fix" such resource 
> false‑positive, I do quick dis/enable the resource or as in 
> this very instance rpm updates which restarted node.
> Again, how cluster might think resource is okey while 
> debug‑monitor shows it's not.
> I only do not know how to reproduce this in a controlled, 
> orderly manner.

Maybe try "xentop" or "watch virsh list" on all nodes and watch what is
happening.

> 
> thanks, L
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/