[ClusterLabs] How to achieve next order behavior for start/stop action for failover

Mon Dec 4 14:51:30 EST 2023

On Mon, Dec 4, 2023 at 8:48 AM Novik Arthur <freishutz at gmail.com> wrote:
>
> Hello community!
> I'm not sure if pacemaker can do it or not with current logic (maybe it could be a new feature), but it's worth asking before starting to "build my own Luna-park ,with blackjack and ...."
>
> Right now I have something like:
> MGS -> MDT -> OST
> order mdt-after-mgs Optional: mgs:start mdt:start
> order ost-after-mgs Optional: mgs:start ost:start
> order ost-after-mdt0000 Optional: mdt0000:start ost:start
>
> We have 4 nodes (A,B,C,D).
> Nodes A and B carry MGS.
> Nodes A,B,C,D carry MDT000[0-3] - one per node.
> Nodes A,B,C,D carry OST000[0-3] - one per node.
> If we stop nodes A and B, MGS will be stopped since there is NO placement to start for it, but MDT000[0-1] and OST000[0-1] could failover and will try to do that and will fail since MGS is a mandatory for us (and by the end will be blocked), but I use optional to avoid unnecessary stop/start chain for MDT/OST.
>
> I want to avoid unnecessary STOP/START actions of each dependent resource in case of failover, but preserve the order and enforce MGS dependency for those resources which are stopped (so, to start I need to follow the chain and if started then do nothing). Think about it like separate procedures for 1st start and failover during work... like soft-mandatory or something like that.

You might try adding "symmetric=false". That option means the
constraints apply to start but not to stop.

Otherwise, I'm struggling to understand the actual vs. desired
behavior here. Perhaps some example outputs or logs would be helpful
to illustrate it.

All of these ordering constraints are optional. This means they're
applied only when both actions would be scheduled. For example, if mgs
and mdt are both scheduled to start, then mgs must start first; but if
only mdt is scheduled to start, then it does not depend on mgs.

Perhaps the fact that these are cloned resources is causing the
ordering constraints to behave differently from expectation... I'm not
sure.

> I think that if I tweak OCF start/stop (make them dummy and always success) and move all logic to monitors with deep checks, so that monitor could mount/umount and etc. + assign transient attrs which could track ready or not for start, and create location rules which prefer/honor transient attrs, then I could achieve desirable state, but it looks very complex and probably doesn't worth it...

Sometimes for complex dependencies, it can be helpful to configure an
ocf:pacemaker:attribute resource (which usually depends on other
resources in turn). This resource agent sets a node attribute on the
node where it's running. The attribute can be useful in rules as part
of more complicated constraint schemes.

>
> I would love to see any thoughts about it.
>
> Thanks,
> A
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker