[ClusterLabs] Antw: [EXT] Re: Disable all resources in a group if one or more of them fail and are unable to reactivate

Mon Feb 1 11:07:39 EST 2021

That's a new one to me. I'm shocked that works ... I'd expect it to be
detected as a colocation loop and ignored.

On Mon, 2021-02-01 at 12:15 +0100, damiano giuliani wrote:
> Hi Guys, sorry for the late answer, today i had the time to test the
> Igor's solution and it works flawlessy.
> creating a colocation constraint , binding the first and the last
> group resources with an INFINITY score make possible to "If at least
> one resource in the group fails the group will fail all resources."
> 
> THe Igor's explanation clarify everything to me.
> 
> adding this line works for me:
> 
> pcs constraint colocation add lta-subscription-backend-ope-s1 with
> s3srvnotificationdispatcher INFINITY
> 
> I would thanks everyone helped me and spend his time.
> 
> Have a good Week!
> 
> Best
> 
> Damian
> 
> Il giorno ven 29 gen 2021 alle ore 11:22 Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> ha scritto:
> > >>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 28.01.2021 um
> > 18:30 in
> > Nachricht <db12df26-6cc4-bad2-8bf5-8ee3aad87533 at gmail.com>:
> > > 27.01.2021 22:03, Ken Gaillot пишет:
> > >> 
> > >> With a group, later members depend on earlier members. If an
> > earlier
> > >> member can't run, then no members after it can run.
> > >> 
> > >> However we can't make the dependency go in both directions. If
> > an
> > >> earlier member can't run unless a later member is active, and
> > vice
> > >> versa, then how can anything be started?
> > >> 
> > >> By default, Pacemaker tries to recover failed resources on the
> > same
> > >> node, up to its migration-threshold (which defaults to a million
> > >> times). Once a group member reaches its migration-threshold,
> > Pacemaker
> > >> will move the entire group to another node if one is available.
> > However
> > >> if no node is available for the failed member, then it will just
> > remain
> > >> stopped (along with any later members in the group), and the
> > earlier
> > >> members will stay active where they are.
> > >> 
> > >> I don't think there's any way to prevent earlier members from
> > running
> > >> if a later member has no available node.
> > >> 
> > > 
> > > All other HA managers I am aware of have collection of resources
> > (often
> > > called "application") as scheduling unit. All resources in one
> > > collection are automatically activated on the same node (they of
> > course
> > > (may) have ordering dependencies). If any required resource in
> > > collection fails, partially active collection is cleaned up, all
> > > resources activated so far are deactivated. This is indeed
> > virtually
> > > impossible to express in pacemaker. The only way I can think of
> > is
> > > artificially restrict management layer to top-level resources,
> > but this
> > > also won't work for stopping group of resources (where "group" is
> > used
> > > generically, not in narrow pacemaker sense) for reasons you
> > explained.
> > 
> > I just wonder: Adding op timeouts to a group?
> > If the groups fails to start or stop within the specific time,
> > consider the
> > whole group as failed...
> > stop a failed start, and fence a failed stop...
> > 
> > Regards,
> > Ulrich
> > 
> > > 
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users 
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/ 
> > 
> > 
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>