[ClusterLabs] Antw: Re: How Pacemaker reacts to fast changes of the same parameter in configuration
Klaus Wenninger
kwenning at redhat.com
Thu Nov 10 10:51:36 UTC 2016
On 11/10/2016 11:34 AM, Kostiantyn Ponomarenko wrote:
> Ulrich Windl,
>
> >> You want your resources to move to their preferred location after
> some problem.
> It is not about that. It is about - I want to control when fail-back
> happens. And I want to be sure that I have full control over it all
> the time.
>
> Klaus Wenninger,
>
> You are right. That is exactly what I want and what I am concerned
> about. Another example with "move" operation is 100% correct.
>
> I've been thinking about another possible approach here since
> yesterday and I've got an idea which actually seems to satisfy my needs.
> At least till a proper solution is available.
> My set-up is a two node cluster.
> I will modify my script to:
>
> 1. issue a command to low down "resource-stickiness" on the local
> node;
> 2. on the other node to trigger a script which waits for cluster
> to finish all transactions (crm_resource --wait) and set
> "resource-stickiness" back to its original value;
> 3. on this node wait for cluster to finish all transactions
> (crm_resource --wait) and set "resource-stickiness" back to its
> original value;
>
> This way I can be sure to have back the original value of
> "resource-stickiness" immediately after fail-back.
> Though, I am still thinking about the best way of how a local script
> can trigger the script on the other node and passing an argument to it.
> If any thoughts, I would like to hear =)
>
>
> I also was thinking about more general approach to it.
> Maybe it is time for higher level cluster configuration tools to
> evolve to provide this robustness?
> So that they can take a sequence of commands and guarantee that they
> will be executed in a predicted order even if a node on which this
> sequence was initiated goes down.
yep, either that or - especially for things where the success of your
cib-modification is very special
to your cluster - you script it.
But in either case the high-level-tool or your script can fail, the node
it is running on can be fenced or
whatever you can think of ...
So I wanted to think about simple, not very invasive things that could
be done within the core of pacemaker
to enable a predictable fallback in such cases.
>
> Or maybe pacemaker can expand its functionality to handle a command
> sequence?
>
> Or this special tagging which you mentioned. Could you please
> elaborate on this one as I am curious how it should work?
That is what the high-level-tools are doing at the moment. You can
recognize the constraints they have
created by their names (prefix).
>
> >> some mechanism that makes the constraints somehow magically
> disappear or disabled when they have achieved what they were intended to.
> You mean something like time based constraints, but instead of
> duration they are event based?
Something in that direction, yes ...
>
> Thank you,
> Kostia
>
> On Thu, Nov 10, 2016 at 11:17 AM, Klaus Wenninger <kwenning at redhat.com
> <mailto:kwenning at redhat.com>> wrote:
>
> On 11/10/2016 08:27 AM, Ulrich Windl wrote:
> >>>> Klaus Wenninger <kwenning at redhat.com
> <mailto:kwenning at redhat.com>> schrieb am 09.11.2016 um 17:42 in
> > Nachricht <80c65564-b299-e504-4c6c-afd0ff86e178 at redhat.com
> <mailto:80c65564-b299-e504-4c6c-afd0ff86e178 at redhat.com>>:
> >> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote:
> >>> When one problem seems to be solved, another one appears.
> >>> Now my script looks this way:
> >>>
> >>> crm --wait configure rsc_defaults resource-stickiness=50
> >>> crm configure rsc_defaults resource-stickiness=150
> >>>
> >>> While now I am sure that transactions caused by the first command
> >>> won't be aborted, I see another possible problem here.
> >>> With a minimum load in the cluster it took 22 sec for this
> script to
> >>> finish.
> >>> I see here a weakness.
> >>> If a node on which this script is called goes down for any
> reasons,
> >>> then "resource-stickiness" is not set back to its original value,
> >>> which is vary bad.
> > I don't quite understand: You want your resources to move to
> their preferred location after some problem. When the node goes
> down with the lower stickiness, there is no problem while the
> other node is down; when it comes up, resources might be moved,
> but isn't that what you wanted?
>
> I guess this is about the general problem with features like e.g.
> 'move'
> as well
> that are so much against how pacemaker is working.
> They are implemented inside the high-level-tooling.
> They are temporarily modifying the CIB and if something happens
> that makes
> this controlling high-level-tool go away it stays as is - or the CIB
> even stays
> modified and the user has to know that he has to do a manual cleanup.
> So we could actually derive a general discussion from that how to
> handle
> these issues in a way that it is less likely to have artefacts
> persist after
> some administrative action.
> At the moment e.g. special tagging for the constraints that are
> automatically
> created to trigger a move is one approach.
> But when would you issue an automatized cleanup? Is there anything
> implemented in high-level-tooling? pcsd I guess would be a
> candidate, for
> crmsh I don't know of a persistent instance that could take care
> of that ...
>
> If we say we won't implement these features in the core of pacemaker
> I definitely agree. But is there anything we could do to make it
> easier
> for high-level-tools?
> I'm thinking of some mechanism that makes the constraints somehow
> magically disappear or disabled when they have achieved what they
> were intended to, if the connection to some administrative-shell is
> lost, or ...
> I could imagine dependency on some token given to a shell, something
> like a suicide-timeout, ...
> Maybe the usual habit when configuring a switch/router can trigger
> some ideas: issue a reboot in x minutes; do a non persistent
> config-change;
> check if everything is fine afterwards; make it persistent; disable
> the timed reboot
>
> >
> >>> So, now I am thinking of how to solve this problem. I would
> appreciate
> >>> any thoughts about this.
> >>>
> >>> Is there a way to ask Pacemaker to do these commands
> sequentially so
> >>> there is no need to wait in the script?
> >>> If it is possible, than I think that my concern from above
> goes away.
> >>>
> >>> Another thing which comes to my mind - is to use time based rules.
> >>> This ways when I need to do a manual fail-back, I simply set (or
> >>> update) a time-based rule from the script.
> >>> And the rule will basically say - set "resource-stickiness" to 50
> >>> right now and expire in 10 min.
> >>> This looks good at the first glance, but there is no a
> reliable way to
> >>> put a minimum sufficient time for it; at least not I am aware of.
> >>> And the thing is - it is important to me that
> "resource-stickiness" is
> >>> set back to its original value as soon as possible.
> >>>
> >>> Those are my thoughts. As I said, I appreciate any ideas here.
> >> Have never tried --wait with crmsh but I would guess that the
> delay you
> >> are observing
> >> is really the time your resources are taking to stop and start
> somewhere
> >> else.
> >>
> >> Actually you would need the reduced stickiness just during the stop
> >> phase - right.
> >>
> >> So as there is no command like "wait till all stops are done"
> you could
> >> still
> >> do the 'crm_simulate -Ls' and check that it doesn't want to stop
> >> anything anymore.
> >> So you can save the time the starts would take.
> >> Unfortunately you have to repeat that and thus put additional
> load on
> >> pacemaker
> >> possibly slowing down things if your poll-cycle is to short.
> >>
> >>>
> >>> Thank you,
> >>> Kostia
> >>>
> >>> On Tue, Nov 8, 2016 at 10:19 PM, Dejan Muhamedagic
> >>> <dejanmm at fastmail.fm <mailto:dejanmm at fastmail.fm>
> <mailto:dejanmm at fastmail.fm <mailto:dejanmm at fastmail.fm>>> wrote:
> >>>
> >>> On Tue, Nov 08, 2016 at 12:54:10PM +0100, Klaus Wenninger
> wrote:
> >>> > On 11/08/2016 11:40 AM, Kostiantyn Ponomarenko wrote:
> >>> > > Hi,
> >>> > >
> >>> > > I need a way to do a manual fail-back on demand.
> >>> > > To be clear, I don't want it to be ON/OFF; I want it to be
> >>> more like
> >>> > > "one shot".
> >>> > > So far I found that the most reasonable way to do it -
> is to set
> >>> > > "resource stickiness" to a different value, and then
> set it
> >>> back to
> >>> > > what it was.
> >>> > > To do that I created a simple script with two lines:
> >>> > >
> >>> > > crm configure rsc_defaults resource-stickiness=50
> >>> > > crm configure rsc_defaults resource-stickiness=150
> >>> > >
> >>> > > There are no timeouts before setting the original
> value back.
> >>> > > If I call this script, I get what I want - Pacemaker moves
> >>> resources
> >>> > > to their preferred locations, and "resource
> stickiness" is set
> >>> back to
> >>> > > its original value.
> >>> > >
> >>> > > Despite it works, I still have few concerns about this
> approach.
> >>> > > Will I get the same behavior under a big load with
> delays on
> >>> systems
> >>> > > in cluster (which is truly possible and a normal case
> in my
> >>> environment)?
> >>> > > How Pacemaker treats fast change of this parameter?
> >>> > > I am worried that if "resource stickiness" is set back
> to its
> >>> original
> >>> > > value to fast, then no fail-back will happen. Is it
> possible, or I
> >>> > > shouldn't worry about it?
> >>> >
> >>> > AFAIK pengine is interrupted when calculating a more
> complicated
> >>> transition
> >>> > and if the situation has changed a transition that is
> just being
> >>> executed
> >>> > is aborted if the input from pengine changed.
> >>> > So I would definitely worry!
> >>> > What you could do is to issue 'crm_simulate -Ls' in
> between and
> >>> grep for
> >>> > an empty transition.
> >>> > There might be more elegant ways but that should be safe.
> >>>
> >>> crmsh has an option (-w) to wait for the PE to settle after
> >>> committing configuration changes.
> >>>
> >>> Thanks,
> >>>
> >>> Dejan
> >>> >
> >>> > > Thank you,
> >>> > > Kostia
> >>> > >
> >>> > >
> >>> > > _______________________________________________
> >>> > > Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> >>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
> >>> > > http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>
> >>> <http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>>
> >>> > >
> >>> > > Project Home: http://www.clusterlabs.org
> >>> > > Getting started:
> >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
> >>> > > Bugs: http://bugs.clusterlabs.org
> >>> >
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> >>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
> >>> > http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>
> >>> <http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>>
> >>> >
> >>> > Project Home: http://www.clusterlabs.org
> >>> > Getting started:
> >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
> >>> > Bugs: http://bugs.clusterlabs.org
> >>>
> >>> _______________________________________________
> >>> Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> >>> <mailto:Users at clusterlabs.org <mailto:Users at clusterlabs.org>>
> >>> http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>
> >>> <http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>>
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> >> http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> > http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> > Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> <mailto:Users at clusterlabs.org>
> http://clusterlabs.org/mailman/listinfo/users
> <http://clusterlabs.org/mailman/listinfo/users>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
More information about the Users
mailing list