[ClusterLabs] Antw: Re: Antw: [EXT] Re: Failed migration causing fencing loop

Mon Apr 4 03:45:46 EDT 2022

>>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 04.04.2022 um 09:21 in
Nachricht <1983151291.781214.1649056896367 at mail.yahoo.com>:
> Do you have a resource for starting up libvirtd and  virtlockd after the 
> OCFS2 ?

Yes:
primitive prm_libvirtd systemd:libvirtd.service ...
primitive prm_lockspace_ocfs2 Filesystem ...
primitive prm_virtlockd systemd:virtlockd ...
clone cln_libvirtd prm_libvirtd ...
clone cln_lockspace_ocfs2 prm_lockspace_ocfs ...
clone cln_virtlockd prm_virtlockd ...
colocation col__libvirtd__virtlockd inf: cln_libvirtd cln_virtlockd
colocation col__virtlockd__lockspace_fs inf: cln_virtlockd cln_lockspace_ocfs2
colocation col__vm__libvirtd inf: ( prm_xen_v01 ... )
order ord__libvirtd__vm Mandatory: cln_libvirtd ( prm_xen_v01 ... )
order ord__lockspace_fs__virtlockd Mandatory: cln_lockspace_ocfs2 cln_virtlockd
order ord__virtlockd__libvirtd Mandatory: cln_virtlockd cln_libvirtd

(some resources were left out, but you get the idea)

Regards,
Ulrich
P.S: Forgot to keep the list in replies...

> 
> Best Regards,Strahil Nikolov
>  
>  
>   On Mon, Apr 4, 2022 at 10:14, Ulrich 
> Windl<Ulrich.Windl at rz.uni-regensburg.de> wrote:   >>> Strahil Nikolov 
> <hunter86_bg at yahoo.com> schrieb am 04.04.2022 um 08:42 in
> Nachricht <2011141682.770390.1649054549570 at mail.yahoo.com>:
>> So,
>> it you use OCFS2 for locking, why the Hypervisor is not responding correctly 
> 
>> to the Virt RA ?
> 
> It seems the VirtualDomain RA required libvirtd to be running, but at the 
> time of startup probes _nothing_ is running.
> That's how I see it.
> 
> pacemaker-controld[7029]:  notice: Result of probe operation for 
> prm_xen_rksapv15 on rksaph18: not running
> ### For whatever reason:
> pacemaker-execd[7021]:  notice: executing - rsc:prm_xen_v15 action:stop 
> call_id:197
> VirtualDomain(prm_xen_v15)[8768]: INFO: Virtual domain v15 currently has no 
> state, retrying.
> VirtualDomain(prm_xen_v15)[8822]: ERROR: Virtual domain v15 has no state 
> during stop operation, bailing out.
> VirtualDomain(prm_xen_v15)[8836]: INFO: Issuing forced shutdown (destroy) 
> request for domain v15.
> VirtualDomain(prm_xen_v15)[8849]: ERROR: forced stop failed
> pacemaker-controld[7029]:  notice: h18-prm_xen_v15_stop_0:197 [ error: 
> failed to connect to the hypervisor error: failed to connect socket to 
> '/var/run/libvirt/libvirt-sock': no such file or directory
> 
> That caused a repeating fencing loop.
> 
> Regards,
> Ulrich
> 
> 
>> Best Regards,Strahil Nikolov
>>  
>>  
>>  On Mon, Apr 4, 2022 at 9:39, Ulrich Windl<Ulrich.Windl at rz.uni-regensburg.de> 
>> wrote:  >>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 01.04.2022 um 
>> 15:20 in
>> Nachricht <1379723177.395165.1648819249186 at mail.yahoo.com>:
>>> To be honest, I have never had to disable it and as far as I know it's 
>>> clusterwide.
>>> As per my understanding, the cluster checks if the resources are running 
>>> before proceeding further. Of course, I might be wrong and it might not help 
> 
>> 
>>> you.
>>> Why don't you setup a shared filesystem for the libvirt's locking ? After 
>>> all your VMs use shared storage .
>> 
>> ??? There is a shared OCFS2 filesystem used for locking, but that 's more a 
>> problem rather than a solution.
>> I wrote: "libvird uses locking (virtlockd), which in turn needs a 
>> cluster-wide filesystem for locks across the nodes."
>> 
>> Regards,
>> Ulrich
>> 
>>> 
>>> Best Regards,Strahil Nikolov
>>>  
>>>  
>>>  On Fri, Apr 1, 2022 at 15:01, Ulrich 
>>> Windl<Ulrich.Windl at rz.uni-regensburg.de> wrote:  >>> Strahil Nikolov 
>>> <hunter86_bg at yahoo.com> schrieb am 01.04.2022 um 00:45 in
>>> Nachricht <624795315.304814.1648766702482 at mail.yahoo.com>:
>>> 
>>> Hi!
>>> 
>>>> What about if you disable the enable-startup-probes at fencing (custom 
>>>> fencing  that sets it to false and fails, so the next fencing device in the 
>>>> topology kicks in)?
>>> 
>>> Interesting idea, but I never heard of the property before.
>>> However it's cluster-wide, right?
>>> 
>>>> When the node joins , it will skip startup probes and later a systemd 
>>>> service or some script check if all nodes were up for at least 15-20 min and 
> 
>> 
>>> 
>>>> enable it back ?
>>> 
>>> Are there any expected disadvantages?
>>> 
>>> Regards,
>>> Ulrich
>>> 
>>>> Best Regards,Strahil Nikolov
>>>>  
>>>>  
>>>>  On Thu, Mar 31, 2022 at 14:02, Ulrich 
>>>>Windl<Ulrich.Windl at rz.uni-regensburg.de> wrote:  >>> "Gao,Yan" <ygao at suse.com> 
>>>> schrieb am 31.03.2022 um 11:18 in Nachricht
>>>> <67785c2f-f875-cb16-608b-77d63d9b02c4 at suse.com>:
>>>>> On 2022/3/31 9:03, Ulrich Windl wrote:
>>>>>> Hi!
>>>>>> 
>>>>>> I just wanted to point out one thing that hit us with SLES15 SP3:
>>>>>> Some failed live VM migration causing node fencing resulted in a fencing 
>>>>> loop, because of two reasons:
>>>>>> 
>>>>>> 1) Pacemaker thinks that even _after_ fencing there is some migration to 
>>>>> "clean up". Pacemaker treats the situation as if the VM is running on both 
>>>>> nodes, thus (50% chance?) trying to stop the VM on the node that just booted 
> 
>> 
>>> 
>>>> 
>>>>> after fencing. That's supid but shouldn't be fatal IF there weren't...
>>>>>> 
>>>>>> 2) The stop operation of the VM (that atually isn't running) fails,
>>>>> 
>>>>> AFAICT it could not connect to the hypervisor, but the logic in the RA 
>>>>> is kind of arguable that the probe (monitor) of the VM returned "not 
>>>>> running", but the stop right after that returned failure...
>>>>> 
>>>>> OTOH, the point about pacemaker is the stop of the resource on the 
>>>>> fenced and rejoined node is not really necessary. There has been 
>>>>> discussions about this here and we are trying to figure out a solution 
>>>>> for it:
>>>>> 
>>>>> https://github.com/ClusterLabs/pacemaker/pull/2146#discussion_r828204919 
>>>>> 
>>>>> For now it requires administrator's intervene if the situation happens:
>>>>> 1) Fix the access to hypervisor before the fenced node rejoins.
>>>> 
>>>> Thanks for the explanation!
>>>> 
>>>> Unfortunately this can be tricky if libvirtd is involved (as it is here):
>>>> libvird uses locking (virtlockd), which in turn needs a cluster-wird 
>>>> filesystem for locks across the nodes.
>>>> When that filesystem is provided by the cluster, it's hard to delay node 
>>>> joining until filesystem,  virtlockd and libvirtd are running.
>>>> 
>>>> (The issue had been discussed before: It does not make sense to run some 
>>>> probes when those probes need other resources to detect the status.
>>>> With just a Boolean status return at best all those probes could say "not 
>>>> running". Ideally a third status like "please try again some later time"
>>>> would be needed, or probes should follow the dependencies of their resources 
> 
>> 
>>> 
>>>> (which may open another can of worms).
>>>> 
>>>> Regards,
>>>> Ulrich
>>>> 
>>>> 
>>>>> 2) Manually cleanup the resource, which tells pacemaker it can safely 
>>>>> forget the historical migrate_to failure.
>>>>> 
>>>>> Regards,
>>>>>    Yan
>>>>> 
>>>>>> causing a node fence. So the loop is complete.
>>>>>> 
>>>>>> Some details (many unrelated messages left out):
>>>>>> 
>>>>>> Mar 30 16:06:14 h16 libvirtd[13637]: internal error: libxenlight failed to 
>>>>> restore domain 'v15'
>>>>>> 
>>>>>> Mar 30 16:06:15 h19 pacemaker-schedulerd[7350]:  warning: Unexpected result 
>>>>> (error: v15: live migration to h16 failed: 1) was recorded for migrate_to of 
> 
>> 
>>> 
>>>> 
>>>>> prm_xen_v15 on h18 at Mar 30 16:06:13 2022
>>>>>> 
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]:  warning: Unexpected result 
>>>>> (OCF_TIMEOUT) was recorded for stop of prm_libvirtd:0 on h18 at Mar 30 
>>>>> 16:13:36 2022
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]:  warning: Unexpected result 
>>>>> (OCF_TIMEOUT) was recorded for stop of prm_libvirtd:0 on h18 at Mar 30 
>>>>> 16:13:36 2022
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]:  warning: Cluster node h18 
>>>>> will be fenced: prm_libvirtd:0 failed there
>>>>>> 
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]:  warning: Unexpected result 
>>>>> (error: v15: live migration to h18 failed: 1) was recorded for migrate_to of 
> 
>> 
>>> 
>>>> 
>>>>> prm_xen_v15 on h16 at Mar 29 23:58:40 2022
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]:  error: Resource prm_xen_v15 
> 
>> 
>>> 
>>>> 
>>>>> is active on 2 nodes (attempting recovery)
>>>>>> 
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]:  notice:  * Restart    
>>>>> prm_xen_v15              (            h18 )
>>>>>> 
>>>>>> Mar 30 16:19:04 h18 VirtualDomain(prm_xen_v15)[8768]: INFO: Virtual domain 
>>>>> v15 currently has no state, retrying.
>>>>>> Mar 30 16:19:05 h18 VirtualDomain(prm_xen_v15)[8787]: INFO: Virtual domain 
>>>>> v15 currently has no state, retrying.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8822]: ERROR: Virtual domain 
>>>>> v15 has no state during stop operation, bailing out.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8836]: INFO: Issuing forced 
>>>>> shutdown (destroy) request for domain v15.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8860]: ERROR: forced stop 
>>>>> failed
>>>>>> 
>>>>>> Mar 30 16:19:07 h19 pacemaker-controld[7351]:  notice: Transition 124 action 
> 
>> 
>>> 
>>>> 
>>>>> 115 (prm_xen_v15_stop_0 on h18): expected 'ok' but got 'error'
>>>>>> 
>>>>>> Note: Our cluster nodes start pacemaker during boot. Yesterday I was there 
>>>>> when the problem happened. But as we had another boot loop some time ago I 
>>>>> wrote a systemd service that counts boots, and if too many happen within a 
>>>>> short time, pacemaker will be disabled on that node. As it it set now, the 
>>>>> counter is reset if the node is up for at least 15 minutes; if it fails more 
> 
>> 
>>> 
>>>> 
>>>>> than 4 times to do so, pacemaker will be disabled. If someone wants to try 
>>>>> that or give feedback, drop me a line, so I could provide the RPM 
>>>>> (boot-loop-handler-0.0.5-0.0.noarch)...
>>>>>> 
>>>>>> Regards,
>>>>>> Ulrich
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Manage your subscription:
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>>>> 
>>>>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>>> 
>>>> ClusterLabs home: https://www.clusterlabs.org/ 
>>>>  
>>> 
>>> 
>>> 
>>>  
>> 
>> 
>> 
>>  
> 
> 
> 
>