[ClusterLabs] Antw: Re: Antw: [EXT] Re: Q: constrain or delay "probes"?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Mar 8 03:57:37 EST 2021


>>> Reid Wahl <nwahl at redhat.com> schrieb am 08.03.2021 um 08:42 in Nachricht
<CAPiuu9_V0-3k9k-Z8+z5u5t8bMh3sL3PzzdOLH9g8XCdmfqDow at mail.gmail.com>:
> Did the "active on too many nodes" message happen right after a probe? If
> so, then it does sound like the probe returned code 0.

Events were like this (I greatly condensed the logs):
(DC h16 being stopped)
Mar 05 09:53:45 h16 pacemaker-schedulerd[7189]:  notice:  * Migrate    prm_xen_v09              ( h16 -> h18 )
Mar 05 09:54:23 h16 pacemaker-controld[7190]:  notice: Initiating migrate_to operation prm_xen_v09_migrate_to_0 locally on h16
Mar 05 09:54:24 h16 libvirtd[8531]: internal error: Failed to send migration data to destination host
Mar 05 09:54:24 h16 VirtualDomain(prm_xen_v09)[1834]: ERROR: v09: live migration to h18 failed: 1
Mar 05 09:54:24 h16 pacemaker-controld[7190]:  notice: Transition 1000 action 125 (prm_xen_v09_migrate_to_0 on h16): expected 'ok' but got 'error'
Mar 05 09:54:47 h16 pacemaker-schedulerd[7189]:  error: Resource prm_xen_v09 is active on 2 nodes (attempting recovery)
(not really active on two nodes; DC recovers on h18 where v09 probably isn't running, but should stop on h16 first)
Mar 05 09:54:47 h16 pacemaker-schedulerd[7189]:  notice:  * Recover    prm_xen_v09              (             h18 )
Mar 05 09:54:47 h16 VirtualDomain(prm_xen_v09)[2068]: INFO: Issuing graceful shutdown request for domain v09.
Mar 05 09:55:12 h16 pacemaker-execd[7187]:  notice: prm_xen_v09 stop (call 297, PID 2035) exited with status 0 (execution time 25101ms, queue time 0ms)
Mar 05 09:55:12 h16 pacemaker-controld[7190]:  notice: Result of stop operation for prm_xen_v09 on h16: ok
Mar 05 09:55:14 h16 pacemaker-controld[7190]:  notice: Transition 1001 aborted by operation prm_xen_v09_start_0 'modify' on h18: Event failed
Mar 05 09:55:14 h16 pacemaker-controld[7190]:  notice: Transition 1001 action 117 (prm_xen_v09_start_0 on h18): expected 'ok' but got 'error'
Mar 05 09:55:15 h16 pacemaker-schedulerd[7189]:  warning: Unexpected result (error: v09: live migration to h18 failed: 1) was recorded for migrate_to of prm_xen_v09 on h16 at Mar  5 09:54:23 2021

Mar 05 09:55:15 h18 pacemaker-execd[7129]:  notice: prm_xen_v09 stop (call 262, PID 46737) exited with status 0 (execution time 309ms, queue time 0ms)

(DC shut down)
Mar 05 09:55:20 h16 pacemakerd[7183]:  notice: Shutdown complete
Mar 05 09:55:20 h16 systemd[1]: Stopped Corosync Cluster Engine.

(node starting after being stopped)
Mar 05 10:38:50 h16 systemd[1]: Starting Shared-storage based fencing daemon...
Mar 05 10:38:50 h16 systemd[1]: Starting Corosync Cluster Engine...
Mar 05 10:38:59 h16 pacemaker-controld[14022]:  notice: Quorum acquired
Mar 05 10:39:00 h16 pacemaker-controld[14022]:  notice: State transition S_PENDING -> S_NOT_DC
(this probe probably reported nonsense)
Mar 05 10:39:02 h16 pacemaker-controld[14022]:  notice: Result of probe operation for prm_xen_v09 on h16: ok
(DC noticed)
Mar 05 10:39:02 h18 pacemaker-controld[7132]:  notice: Transition 5 action 58 (prm_xen_v09_monitor_0 on h16): expected 'not running' but got 'ok'
(from now on probes should be more reliable)
Mar 05 10:39:07 h16 systemd[1]: Started Virtualization daemon.
(there is nothing to stop)
Mar 05 10:39:09 h16 pacemaker-execd[14019]:  notice: executing - rsc:prm_xen_v09 action:stop call_id:166
(obviously)
Mar 05 10:40:11 h16 libvirtd[15490]: internal error: Failed to shutdown domain '20' with libxenlight
(more nonsense)
Mar 05 10:44:04 h16 VirtualDomain(prm_xen_v09)[17306]: INFO: Issuing forced shutdown (destroy) request for domain v09.
(eventually)
Mar 05 10:44:07 h16 pacemaker-controld[14022]:  notice: Result of stop operation for prm_xen_v09 on h16: ok
Mar 05 10:44:07 h16 pacemaker-execd[14019]:  notice: executing - rsc:prm_xen_v09 action:start call_id:168

> 
> If a probe returned 0 and it **shouldn't** have done so, then either the
> monitor operation needs to be redesigned, or resource-discovery=never (or
> resource-discovery=exclusive) can be used to prevent the probe from
> happening where it should not.

Well, the situation here is using virtlockd with indirect locking in a cluster when the cluster provided the shared filesystem used for locking.

Then the obvious ordering is:
1) Provide shared filesystem (mount it)
2) start virtlockd (to put the lock files in a shared place)
3) run libvirtd (using virtlockd)
4) Manage VMs using libvirt

Unfortunately probes (expecting to use libvirt) are being run even before 1), and I don't know why they return success then.
(Other VMs were probed as "not running")

> 
> If a probe returned 0 and it **should** have done so, but the stop
> operation on the other node wasn't reflected in the CIB (so that the
> resource still appeared to be active there), then that's odd.

Well, when reviewing the logs, the cluster may actually have v09 running on h16 even though the node was stopped.
So the problem was on stopping, not starting, but still I doubt the probe at that time is quite reliable.

> 
> A bug is certainly possible, though we can't say without more detail :)

I see what you mean.

Regards,
Ulrich

> 
> On Sun, Mar 7, 2021 at 11:10 PM Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> wrote:
> 
>> >>> Reid Wahl <nwahl at redhat.com> schrieb am 05.03.2021 um 21:22 in
>> Nachricht
>> <CAPiuu991O08DnaVkm9bc8N9BK-+NH9e0_F25o6DdiS5WZWGSsQ at mail.gmail.com>:
>> > On Fri, Mar 5, 2021 at 10:13 AM Ken Gaillot <kgaillot at redhat.com> wrote:
>> >
>> >> On Fri, 2021-03-05 at 11:39 +0100, Ulrich Windl wrote:
>> >> > Hi!
>> >> >
>> >> > I'm unsure what actually causes a problem I see (a resource was
>> >> > "detected running" when it actually was not), but I'm sure some probe
>> >> > started on cluster node start cannot provide a useful result until
>> >> > some other resource has been started. AFAIK there is no way to make a
>> >> > probe obey odering or colocation constraints, so the only work-around
>> >> > seems to be a delay. However I'm unsure whether probes can actually
>> >> > be delayed.
>> >> >
>> >> > Ideas?
>> >>
>> >> Ordered probes are a thorny problem that we've never been able to come
>> >> up with a general solution for. We do order certain probes where we
>> >> have enough information to know it's safe. The problem is that it is
>> >> very easy to introduce ordering loops.
>> >>
>> >> I don't remember if there any workarounds.
>> >>
>> >
>> > Maybe as a workaround:
>> >   - Add an ocf:pacemaker:attribute resource after-and-with rsc1
>> >   - Then configure a location rule for rsc2 with resource-discovery=never
>> > and score=-INFINITY with expression (in pseudocode) "attribute is not set
>> > to active value"
>> >
>> > I haven't tested but that might cause rsc2's probe to wait until rsc1 is
>> > active.
>> >
>> > And of course, use the usual constraints/rules to ensure rsc2's probe
>> only
>> > runs on rsc1's node.
>> >
>> >
>> >> > Despite of that I wonder whether some probe/monitor returncode like
>> >> > OCF_NOT_READY would make sense if the operation detects that it
>> >> > cannot return a current status (so both "running" and "stopped" would
>> >> > be as inadequate as "starting" and "stopping" would be (despite of
>> >> > the fact that the latter two do not exist)).
>> >>
>> >
>> > This seems logically reasonable, independent of any implementation
>> > complexity and considerations of what we would do with that return code.
>>
>> Thanks for the proposal!
>> The actual problem I was facing was that the cluster claimed some resource
>> would be running on two nodes at the same time, when actually one node had
>> been stopped properly (with all the resources). The bad state in the CIB
>> was most likely due to a software bug in pacemaker, but probes on
>> re-starting the node seemed not to prevent pacemaker from doing a really
>> wrong "recovery action".
>> My hope was that probes might update the CIB before some stupid action is
>> being dopne. Maybe it's just another software bug...
>>
>> Regards,
>> Ulrich
>>
>> >
>> >
>> >> > Regards,
>> >> > Ulrich
>> >> --
>> >> Ken Gaillot <kgaillot at redhat.com>
>> >>
>> >> _______________________________________________
>> >> Manage your subscription:
>> >> https://lists.clusterlabs.org/mailman/listinfo/users 
>> >>
>> >> ClusterLabs home: https://www.clusterlabs.org/ 
>> >>
>> >>
>> >
>> > --
>> > Regards,
>> >
>> > Reid Wahl, RHCA
>> > Senior Software Maintenance Engineer, Red Hat
>> > CEE - Platform Support Delivery - ClusterHA
>>
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
>>
> 
> -- 
> Regards,
> 
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA





More information about the Users mailing list