[ClusterLabs] Antw: Re: How Pacemaker reacts to fast changes of the same parameter in configuration

Thu Nov 10 10:34:36 UTC 2016

Ulrich Windl,

>> You want your resources to move to their preferred location after some
problem.
It is not about that. It is about - I want to control when fail-back
happens. And I want to be sure that I have full control over it all the
time.

Klaus Wenninger,

You are right. That is exactly what I want and what I am concerned about.
Another example with "move" operation is 100% correct.

I've been thinking about another possible approach here since yesterday and
I've got an idea which actually seems to satisfy my needs.
At least till a proper solution is available.
My set-up is a two node cluster.
I will modify my script to:

    1. issue a command to low down "resource-stickiness" on the local node;
    2. on the other node to trigger a script which waits for cluster to
finish all transactions (crm_resource --wait) and set "resource-stickiness"
back to its original value;
    3. on this node wait for cluster to finish all transactions (crm_resource
--wait) and set "resource-stickiness" back to its original value;

This way I can be sure to have back the original value of "
resource-stickiness" immediately after fail-back.
Though, I am still thinking about the best way of how a local script can
trigger the script on the other node and passing an argument to it.
If any thoughts, I would like to hear =)

I also was thinking about more general approach to it.
Maybe it is time for higher level cluster configuration tools to evolve to
provide this robustness?
So that they can take a sequence of commands and guarantee that they will
be executed in a predicted order even if a node on which this sequence was
initiated goes down.

Or maybe pacemaker can expand its functionality to handle a command
sequence?

Or this special tagging which you mentioned. Could you please elaborate on
this one as I am curious how it should work?

>> some mechanism that makes the constraints somehow magically disappear or
disabled when they have achieved what they were intended to.
You mean something like time based constraints, but instead of duration
they are event based?

Thank you,
Kostia

On Thu, Nov 10, 2016 at 11:17 AM, Klaus Wenninger <kwenning at redhat.com>
wrote:

> On 11/10/2016 08:27 AM, Ulrich Windl wrote:
> >>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 09.11.2016 um 17:42
> in
> > Nachricht <80c65564-b299-e504-4c6c-afd0ff86e178 at redhat.com>:
> >> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote:
> >>> When one problem seems to be solved, another one appears.
> >>> Now my script looks this way:
> >>>
> >>>     crm --wait configure rsc_defaults resource-stickiness=50
> >>>     crm configure rsc_defaults resource-stickiness=150
> >>>
> >>> While now I am sure that transactions caused by the first command
> >>> won't be aborted, I see another possible problem here.
> >>> With a minimum load in the cluster it took 22 sec for this script to
> >>> finish.
> >>> I see here a weakness.
> >>> If a node on which this script is called goes down for any reasons,
> >>> then "resource-stickiness" is not set back to its original value,
> >>> which is vary bad.
> > I don't quite understand: You want your resources to move to their
> preferred location after some problem. When the node goes down with the
> lower stickiness, there is no problem while the other node is down; when it
> comes up, resources might be moved, but isn't that what you wanted?
>
> I guess this is about the general problem with features like e.g. 'move'
> as well
> that are so much against how pacemaker is working.
> They are implemented inside the high-level-tooling.
> They are temporarily modifying the CIB and if something happens that makes
> this controlling high-level-tool go away it stays as is - or the CIB
> even stays
> modified and the user has to know that he has to do a manual cleanup.
> So we could actually derive a general discussion from that how to handle
> these issues in a way that it is less likely to have artefacts persist
> after
> some administrative action.
> At the moment e.g. special tagging for the constraints that are
> automatically
> created to trigger a move  is one approach.
> But when would you issue an automatized cleanup? Is there anything
> implemented in high-level-tooling? pcsd I guess would be a candidate, for
> crmsh I don't know of a persistent instance that could take care of that
> ...
>
> If we say we won't implement these features in the core of pacemaker
> I definitely agree. But is there anything we could do to make it easier
> for high-level-tools?
> I'm thinking of some mechanism that makes the constraints somehow
> magically disappear or disabled when they have achieved what they
> were intended to, if the connection to some administrative-shell is
> lost, or ...
> I could imagine dependency on some token given to a shell, something
> like a suicide-timeout, ...
> Maybe the usual habit when configuring a switch/router can trigger
> some ideas: issue a reboot in x minutes; do a non persistent config-change;
> check if everything is fine afterwards; make it persistent; disable
> the timed reboot
>
> >
> >>> So, now I am thinking of how to solve this problem. I would appreciate
> >>> any thoughts about this.
> >>>
> >>> Is there a way to ask Pacemaker to do these commands sequentially so
> >>> there is no need to wait in the script?
> >>> If it is possible, than I think that my concern from above goes away.
> >>>
> >>> Another thing which comes to my mind - is to use time based rules.
> >>> This ways when I need to do a manual fail-back, I simply set (or
> >>> update) a time-based rule from the script.
> >>> And the rule will basically say - set "resource-stickiness" to 50
> >>> right now and expire in 10 min.
> >>> This looks good at the first glance, but there is no a reliable way to
> >>> put a minimum sufficient time for it; at least not I am aware of.
> >>> And the thing is - it is important to me that "resource-stickiness" is
> >>> set back to its original value as soon as possible.
> >>>
> >>> Those are my thoughts. As I said, I appreciate any ideas here.
> >> Have never tried --wait with crmsh but I would guess that the delay you
> >> are observing
> >> is really the time your resources are taking to stop and start somewhere
> >> else.
> >>
> >> Actually you would need the reduced stickiness just during the stop
> >> phase - right.
> >>
> >> So as there is no command like "wait till all stops are done" you could
> >> still
> >> do the 'crm_simulate -Ls' and check that it doesn't want to stop
> >> anything anymore.
> >> So you can save the time the starts would take.
> >> Unfortunately you have to repeat that and thus put additional load on
> >> pacemaker
> >> possibly slowing down things if your poll-cycle is to short.
> >>
> >>>
> >>> Thank you,
> >>> Kostia
> >>>
> >>> On Tue, Nov 8, 2016 at 10:19 PM, Dejan Muhamedagic
> >>> <dejanmm at fastmail.fm <mailto:dejanmm at fastmail.fm>> wrote:
> >>>
> >>>     On Tue, Nov 08, 2016 at 12:54:10PM +0100, Klaus Wenninger wrote:
> >>>     > On 11/08/2016 11:40 AM, Kostiantyn Ponomarenko wrote:
> >>>     > > Hi,
> >>>     > >
> >>>     > > I need a way to do a manual fail-back on demand.
> >>>     > > To be clear, I don't want it to be ON/OFF; I want it to be
> >>>     more like
> >>>     > > "one shot".
> >>>     > > So far I found that the most reasonable way to do it - is to
> set
> >>>     > > "resource stickiness" to a different value, and then set it
> >>>     back to
> >>>     > > what it was.
> >>>     > > To do that I created a simple script with two lines:
> >>>     > >
> >>>     > >     crm configure rsc_defaults resource-stickiness=50
> >>>     > >     crm configure rsc_defaults resource-stickiness=150
> >>>     > >
> >>>     > > There are no timeouts before setting the original value back.
> >>>     > > If I call this script, I get what I want - Pacemaker moves
> >>>     resources
> >>>     > > to their preferred locations, and "resource stickiness" is set
> >>>     back to
> >>>     > > its original value.
> >>>     > >
> >>>     > > Despite it works, I still have few concerns about this
> approach.
> >>>     > > Will I get the same behavior under a big load with delays on
> >>>     systems
> >>>     > > in cluster (which is truly possible and a normal case in my
> >>>     environment)?
> >>>     > > How Pacemaker treats fast change of this parameter?
> >>>     > > I am worried that if "resource stickiness" is set back to its
> >>>     original
> >>>     > > value to fast, then no fail-back will happen. Is it possible,
> or I
> >>>     > > shouldn't worry about it?
> >>>     >
> >>>     > AFAIK pengine is interrupted when calculating a more complicated
> >>>     transition
> >>>     > and if the situation has changed a transition that is just being
> >>>     executed
> >>>     > is aborted if the input from pengine changed.
> >>>     > So I would definitely worry!
> >>>     > What you could do is to issue 'crm_simulate -Ls' in between and
> >>>     grep for
> >>>     > an empty transition.
> >>>     > There might be more elegant ways but that should be safe.
> >>>
> >>>     crmsh has an option (-w) to wait for the PE to settle after
> >>>     committing configuration changes.
> >>>
> >>>     Thanks,
> >>>
> >>>     Dejan
> >>>     >
> >>>     > > Thank you,
> >>>     > > Kostia
> >>>     > >
> >>>     > >
> >>>     > > _______________________________________________
> >>>     > > Users mailing list: Users at clusterlabs.org
> >>>     <mailto:Users at clusterlabs.org>
> >>>     > > http://clusterlabs.org/mailman/listinfo/users
> >>>     <http://clusterlabs.org/mailman/listinfo/users>
> >>>     > >
> >>>     > > Project Home: http://www.clusterlabs.org
> >>>     > > Getting started:
> >>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >>>     > > Bugs: http://bugs.clusterlabs.org
> >>>     >
> >>>     >
> >>>     >
> >>>     > _______________________________________________
> >>>     > Users mailing list: Users at clusterlabs.org
> >>>     <mailto:Users at clusterlabs.org>
> >>>     > http://clusterlabs.org/mailman/listinfo/users
> >>>     <http://clusterlabs.org/mailman/listinfo/users>
> >>>     >
> >>>     > Project Home: http://www.clusterlabs.org
> >>>     > Getting started:
> >>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >>>     > Bugs: http://bugs.clusterlabs.org
> >>>
> >>>     _______________________________________________
> >>>     Users mailing list: Users at clusterlabs.org
> >>>     <mailto:Users at clusterlabs.org>
> >>>     http://clusterlabs.org/mailman/listinfo/users
> >>>     <http://clusterlabs.org/mailman/listinfo/users>
> >>>
> >>>     Project Home: http://www.clusterlabs.org
> >>>     Getting started:
> >>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >>>     Bugs: http://bugs.clusterlabs.org
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> >> http://clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/
> doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20161110/146b82ab/attachment.htm>