[ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

Thu Jan 28 12:30:02 EST 2021

27.01.2021 22:03, Ken Gaillot пишет:
> 
> With a group, later members depend on earlier members. If an earlier
> member can't run, then no members after it can run.
> 
> However we can't make the dependency go in both directions. If an
> earlier member can't run unless a later member is active, and vice
> versa, then how can anything be started?
> 
> By default, Pacemaker tries to recover failed resources on the same
> node, up to its migration-threshold (which defaults to a million
> times). Once a group member reaches its migration-threshold, Pacemaker
> will move the entire group to another node if one is available. However
> if no node is available for the failed member, then it will just remain
> stopped (along with any later members in the group), and the earlier
> members will stay active where they are.
> 
> I don't think there's any way to prevent earlier members from running
> if a later member has no available node.
> 

All other HA managers I am aware of have collection of resources (often
called "application") as scheduling unit. All resources in one
collection are automatically activated on the same node (they of course
(may) have ordering dependencies). If any required resource in
collection fails, partially active collection is cleaned up, all
resources activated so far are deactivated. This is indeed virtually
impossible to express in pacemaker. The only way I can think of is
artificially restrict management layer to top-level resources, but this
also won't work for stopping group of resources (where "group" is used
generically, not in narrow pacemaker sense) for reasons you explained.