[ClusterLabs] Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

Mon Feb 1 06:15:37 EST 2021

Hi Guys, sorry for the late answer, today i had the time to test the Igor's
solution and it works flawlessy.
creating a colocation constraint , binding the first and the last group
resources with an INFINITY score make possible to "If at least one resource
in the group fails the group will fail all resources."

THe Igor's explanation clarify everything to me.

adding this line works for me:

pcs constraint colocation add lta-subscription-backend-ope-s1 with
s3srvnotificationdispatcher INFINITY

I would thanks everyone helped me and spend his time.

Have a good Week!

Best

Damian

Il giorno ven 29 gen 2021 alle ore 11:22 Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> ha scritto:

> >>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 28.01.2021 um 18:30
> in
> Nachricht <db12df26-6cc4-bad2-8bf5-8ee3aad87533 at gmail.com>:
> > 27.01.2021 22:03, Ken Gaillot пишет:
> >>
> >> With a group, later members depend on earlier members. If an earlier
> >> member can't run, then no members after it can run.
> >>
> >> However we can't make the dependency go in both directions. If an
> >> earlier member can't run unless a later member is active, and vice
> >> versa, then how can anything be started?
> >>
> >> By default, Pacemaker tries to recover failed resources on the same
> >> node, up to its migration-threshold (which defaults to a million
> >> times). Once a group member reaches its migration-threshold, Pacemaker
> >> will move the entire group to another node if one is available. However
> >> if no node is available for the failed member, then it will just remain
> >> stopped (along with any later members in the group), and the earlier
> >> members will stay active where they are.
> >>
> >> I don't think there's any way to prevent earlier members from running
> >> if a later member has no available node.
> >>
> >
> > All other HA managers I am aware of have collection of resources (often
> > called "application") as scheduling unit. All resources in one
> > collection are automatically activated on the same node (they of course
> > (may) have ordering dependencies). If any required resource in
> > collection fails, partially active collection is cleaned up, all
> > resources activated so far are deactivated. This is indeed virtually
> > impossible to express in pacemaker. The only way I can think of is
> > artificially restrict management layer to top-level resources, but this
> > also won't work for stopping group of resources (where "group" is used
> > generically, not in narrow pacemaker sense) for reasons you explained.
>
> I just wonder: Adding op timeouts to a group?
> If the groups fails to start or stop within the specific time, consider the
> whole group as failed...
> stop a failed start, and fence a failed stop...
>
> Regards,
> Ulrich
>
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210201/778dbb50/attachment.htm>