[ClusterLabs] Antw: Re: Antw: [EXT] Re: Failed migration causing fencing loop
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Apr 4 03:45:46 EDT 2022
>>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 04.04.2022 um 09:21 in
Nachricht <1983151291.781214.1649056896367 at mail.yahoo.com>:
> Do you have a resource for starting up libvirtd and virtlockd after the
> OCFS2 ?
Yes:
primitive prm_libvirtd systemd:libvirtd.service ...
primitive prm_lockspace_ocfs2 Filesystem ...
primitive prm_virtlockd systemd:virtlockd ...
clone cln_libvirtd prm_libvirtd ...
clone cln_lockspace_ocfs2 prm_lockspace_ocfs ...
clone cln_virtlockd prm_virtlockd ...
colocation col__libvirtd__virtlockd inf: cln_libvirtd cln_virtlockd
colocation col__virtlockd__lockspace_fs inf: cln_virtlockd cln_lockspace_ocfs2
colocation col__vm__libvirtd inf: ( prm_xen_v01 ... )
order ord__libvirtd__vm Mandatory: cln_libvirtd ( prm_xen_v01 ... )
order ord__lockspace_fs__virtlockd Mandatory: cln_lockspace_ocfs2 cln_virtlockd
order ord__virtlockd__libvirtd Mandatory: cln_virtlockd cln_libvirtd
(some resources were left out, but you get the idea)
Regards,
Ulrich
P.S: Forgot to keep the list in replies...
>
> Best Regards,Strahil Nikolov
>
>
> On Mon, Apr 4, 2022 at 10:14, Ulrich
> Windl<Ulrich.Windl at rz.uni-regensburg.de> wrote: >>> Strahil Nikolov
> <hunter86_bg at yahoo.com> schrieb am 04.04.2022 um 08:42 in
> Nachricht <2011141682.770390.1649054549570 at mail.yahoo.com>:
>> So,
>> it you use OCFS2 for locking, why the Hypervisor is not responding correctly
>
>> to the Virt RA ?
>
> It seems the VirtualDomain RA required libvirtd to be running, but at the
> time of startup probes _nothing_ is running.
> That's how I see it.
>
> pacemaker-controld[7029]: notice: Result of probe operation for
> prm_xen_rksapv15 on rksaph18: not running
> ### For whatever reason:
> pacemaker-execd[7021]: notice: executing - rsc:prm_xen_v15 action:stop
> call_id:197
> VirtualDomain(prm_xen_v15)[8768]: INFO: Virtual domain v15 currently has no
> state, retrying.
> VirtualDomain(prm_xen_v15)[8822]: ERROR: Virtual domain v15 has no state
> during stop operation, bailing out.
> VirtualDomain(prm_xen_v15)[8836]: INFO: Issuing forced shutdown (destroy)
> request for domain v15.
> VirtualDomain(prm_xen_v15)[8849]: ERROR: forced stop failed
> pacemaker-controld[7029]: notice: h18-prm_xen_v15_stop_0:197 [ error:
> failed to connect to the hypervisor error: failed to connect socket to
> '/var/run/libvirt/libvirt-sock': no such file or directory
>
> That caused a repeating fencing loop.
>
> Regards,
> Ulrich
>
>
>> Best Regards,Strahil Nikolov
>>
>>
>> On Mon, Apr 4, 2022 at 9:39, Ulrich Windl<Ulrich.Windl at rz.uni-regensburg.de>
>> wrote: >>> Strahil Nikolov <hunter86_bg at yahoo.com> schrieb am 01.04.2022 um
>> 15:20 in
>> Nachricht <1379723177.395165.1648819249186 at mail.yahoo.com>:
>>> To be honest, I have never had to disable it and as far as I know it's
>>> clusterwide.
>>> As per my understanding, the cluster checks if the resources are running
>>> before proceeding further. Of course, I might be wrong and it might not help
>
>>
>>> you.
>>> Why don't you setup a shared filesystem for the libvirt's locking ? After
>>> all your VMs use shared storage .
>>
>> ??? There is a shared OCFS2 filesystem used for locking, but that 's more a
>> problem rather than a solution.
>> I wrote: "libvird uses locking (virtlockd), which in turn needs a
>> cluster-wide filesystem for locks across the nodes."
>>
>> Regards,
>> Ulrich
>>
>>>
>>> Best Regards,Strahil Nikolov
>>>
>>>
>>> On Fri, Apr 1, 2022 at 15:01, Ulrich
>>> Windl<Ulrich.Windl at rz.uni-regensburg.de> wrote: >>> Strahil Nikolov
>>> <hunter86_bg at yahoo.com> schrieb am 01.04.2022 um 00:45 in
>>> Nachricht <624795315.304814.1648766702482 at mail.yahoo.com>:
>>>
>>> Hi!
>>>
>>>> What about if you disable the enable-startup-probes at fencing (custom
>>>> fencing that sets it to false and fails, so the next fencing device in the
>>>> topology kicks in)?
>>>
>>> Interesting idea, but I never heard of the property before.
>>> However it's cluster-wide, right?
>>>
>>>> When the node joins , it will skip startup probes and later a systemd
>>>> service or some script check if all nodes were up for at least 15-20 min and
>
>>
>>>
>>>> enable it back ?
>>>
>>> Are there any expected disadvantages?
>>>
>>> Regards,
>>> Ulrich
>>>
>>>> Best Regards,Strahil Nikolov
>>>>
>>>>
>>>> On Thu, Mar 31, 2022 at 14:02, Ulrich
>>>>Windl<Ulrich.Windl at rz.uni-regensburg.de> wrote: >>> "Gao,Yan" <ygao at suse.com>
>>>> schrieb am 31.03.2022 um 11:18 in Nachricht
>>>> <67785c2f-f875-cb16-608b-77d63d9b02c4 at suse.com>:
>>>>> On 2022/3/31 9:03, Ulrich Windl wrote:
>>>>>> Hi!
>>>>>>
>>>>>> I just wanted to point out one thing that hit us with SLES15 SP3:
>>>>>> Some failed live VM migration causing node fencing resulted in a fencing
>>>>> loop, because of two reasons:
>>>>>>
>>>>>> 1) Pacemaker thinks that even _after_ fencing there is some migration to
>>>>> "clean up". Pacemaker treats the situation as if the VM is running on both
>>>>> nodes, thus (50% chance?) trying to stop the VM on the node that just booted
>
>>
>>>
>>>>
>>>>> after fencing. That's supid but shouldn't be fatal IF there weren't...
>>>>>>
>>>>>> 2) The stop operation of the VM (that atually isn't running) fails,
>>>>>
>>>>> AFAICT it could not connect to the hypervisor, but the logic in the RA
>>>>> is kind of arguable that the probe (monitor) of the VM returned "not
>>>>> running", but the stop right after that returned failure...
>>>>>
>>>>> OTOH, the point about pacemaker is the stop of the resource on the
>>>>> fenced and rejoined node is not really necessary. There has been
>>>>> discussions about this here and we are trying to figure out a solution
>>>>> for it:
>>>>>
>>>>> https://github.com/ClusterLabs/pacemaker/pull/2146#discussion_r828204919
>>>>>
>>>>> For now it requires administrator's intervene if the situation happens:
>>>>> 1) Fix the access to hypervisor before the fenced node rejoins.
>>>>
>>>> Thanks for the explanation!
>>>>
>>>> Unfortunately this can be tricky if libvirtd is involved (as it is here):
>>>> libvird uses locking (virtlockd), which in turn needs a cluster-wird
>>>> filesystem for locks across the nodes.
>>>> When that filesystem is provided by the cluster, it's hard to delay node
>>>> joining until filesystem, virtlockd and libvirtd are running.
>>>>
>>>> (The issue had been discussed before: It does not make sense to run some
>>>> probes when those probes need other resources to detect the status.
>>>> With just a Boolean status return at best all those probes could say "not
>>>> running". Ideally a third status like "please try again some later time"
>>>> would be needed, or probes should follow the dependencies of their resources
>
>>
>>>
>>>> (which may open another can of worms).
>>>>
>>>> Regards,
>>>> Ulrich
>>>>
>>>>
>>>>> 2) Manually cleanup the resource, which tells pacemaker it can safely
>>>>> forget the historical migrate_to failure.
>>>>>
>>>>> Regards,
>>>>> Yan
>>>>>
>>>>>> causing a node fence. So the loop is complete.
>>>>>>
>>>>>> Some details (many unrelated messages left out):
>>>>>>
>>>>>> Mar 30 16:06:14 h16 libvirtd[13637]: internal error: libxenlight failed to
>>>>> restore domain 'v15'
>>>>>>
>>>>>> Mar 30 16:06:15 h19 pacemaker-schedulerd[7350]: warning: Unexpected result
>>>>> (error: v15: live migration to h16 failed: 1) was recorded for migrate_to of
>
>>
>>>
>>>>
>>>>> prm_xen_v15 on h18 at Mar 30 16:06:13 2022
>>>>>>
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]: warning: Unexpected result
>>>>> (OCF_TIMEOUT) was recorded for stop of prm_libvirtd:0 on h18 at Mar 30
>>>>> 16:13:36 2022
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]: warning: Unexpected result
>>>>> (OCF_TIMEOUT) was recorded for stop of prm_libvirtd:0 on h18 at Mar 30
>>>>> 16:13:36 2022
>>>>>> Mar 30 16:13:37 h19 pacemaker-schedulerd[7350]: warning: Cluster node h18
>>>>> will be fenced: prm_libvirtd:0 failed there
>>>>>>
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]: warning: Unexpected result
>>>>> (error: v15: live migration to h18 failed: 1) was recorded for migrate_to of
>
>>
>>>
>>>>
>>>>> prm_xen_v15 on h16 at Mar 29 23:58:40 2022
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]: error: Resource prm_xen_v15
>
>>
>>>
>>>>
>>>>> is active on 2 nodes (attempting recovery)
>>>>>>
>>>>>> Mar 30 16:19:00 h19 pacemaker-schedulerd[7350]: notice: * Restart
>>>>> prm_xen_v15 ( h18 )
>>>>>>
>>>>>> Mar 30 16:19:04 h18 VirtualDomain(prm_xen_v15)[8768]: INFO: Virtual domain
>>>>> v15 currently has no state, retrying.
>>>>>> Mar 30 16:19:05 h18 VirtualDomain(prm_xen_v15)[8787]: INFO: Virtual domain
>>>>> v15 currently has no state, retrying.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8822]: ERROR: Virtual domain
>>>>> v15 has no state during stop operation, bailing out.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8836]: INFO: Issuing forced
>>>>> shutdown (destroy) request for domain v15.
>>>>>> Mar 30 16:19:07 h18 VirtualDomain(prm_xen_v15)[8860]: ERROR: forced stop
>>>>> failed
>>>>>>
>>>>>> Mar 30 16:19:07 h19 pacemaker-controld[7351]: notice: Transition 124 action
>
>>
>>>
>>>>
>>>>> 115 (prm_xen_v15_stop_0 on h18): expected 'ok' but got 'error'
>>>>>>
>>>>>> Note: Our cluster nodes start pacemaker during boot. Yesterday I was there
>>>>> when the problem happened. But as we had another boot loop some time ago I
>>>>> wrote a systemd service that counts boots, and if too many happen within a
>>>>> short time, pacemaker will be disabled on that node. As it it set now, the
>>>>> counter is reset if the node is up for at least 15 minutes; if it fails more
>
>>
>>>
>>>>
>>>>> than 4 times to do so, pacemaker will be disabled. If someone wants to try
>>>>> that or give feedback, drop me a line, so I could provide the RPM
>>>>> (boot-loop-handler-0.0.5-0.0.noarch)...
>>>>>>
>>>>>> Regards,
>>>>>> Ulrich
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Manage your subscription:
>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
>
>
More information about the Users
mailing list