[ClusterLabs] How Pacemaker reacts to fast changes of the same parameter in configuration

Wed Nov 9 18:57:37 CET 2016

>> Actually you would need the reduced stickiness just during the stop phase
- right.
Oh, that is good to know.

While I can reduce time when waiting for only "stop" commands to finish, I
don't think that this is worth it.
Because this doesn't address my problem fully.

Does that mean that the reality is cruel, and there is no way to tell
Pacemaker - here you have this two commands, execute them sequentially?

It is all about usability for the end user.
As a last resort I was thinking about not providing this "do a fail-back"
one-shot button to a user.
But instead provide "fail-back ON/OFF" switch-button, with some kind of
indicator "resources are placed optimally".

Anyways, maybe there still are some other ideas?
I really want to have this "one shot fail-back" rock-solid solution, and
maybe I am missing here something =)
Or maybe it can be a feature request =)

Thank you,
Kostia

On Wed, Nov 9, 2016 at 6:42 PM, Klaus Wenninger <kwenning at redhat.com> wrote:

> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote:
> > When one problem seems to be solved, another one appears.
> > Now my script looks this way:
> >
> >     crm --wait configure rsc_defaults resource-stickiness=50
> >     crm configure rsc_defaults resource-stickiness=150
> >
> > While now I am sure that transactions caused by the first command
> > won't be aborted, I see another possible problem here.
> > With a minimum load in the cluster it took 22 sec for this script to
> > finish.
> > I see here a weakness.
> > If a node on which this script is called goes down for any reasons,
> > then "resource-stickiness" is not set back to its original value,
> > which is vary bad.
> >
> > So, now I am thinking of how to solve this problem. I would appreciate
> > any thoughts about this.
> >
> > Is there a way to ask Pacemaker to do these commands sequentially so
> > there is no need to wait in the script?
> > If it is possible, than I think that my concern from above goes away.
> >
> > Another thing which comes to my mind - is to use time based rules.
> > This ways when I need to do a manual fail-back, I simply set (or
> > update) a time-based rule from the script.
> > And the rule will basically say - set "resource-stickiness" to 50
> > right now and expire in 10 min.
> > This looks good at the first glance, but there is no a reliable way to
> > put a minimum sufficient time for it; at least not I am aware of.
> > And the thing is - it is important to me that "resource-stickiness" is
> > set back to its original value as soon as possible.
> >
> > Those are my thoughts. As I said, I appreciate any ideas here.
>
> Have never tried --wait with crmsh but I would guess that the delay you
> are observing
> is really the time your resources are taking to stop and start somewhere
> else.
>
> Actually you would need the reduced stickiness just during the stop
> phase - right.
>
> So as there is no command like "wait till all stops are done" you could
> still
> do the 'crm_simulate -Ls' and check that it doesn't want to stop
> anything anymore.
> So you can save the time the starts would take.
> Unfortunately you have to repeat that and thus put additional load on
> pacemaker
> possibly slowing down things if your poll-cycle is to short.
>
> >
> >
> > Thank you,
> > Kostia
> >
> > On Tue, Nov 8, 2016 at 10:19 PM, Dejan Muhamedagic
> > <dejanmm at fastmail.fm <mailto:dejanmm at fastmail.fm>> wrote:
> >
> >     On Tue, Nov 08, 2016 at 12:54:10PM +0100, Klaus Wenninger wrote:
> >     > On 11/08/2016 11:40 AM, Kostiantyn Ponomarenko wrote:
> >     > > Hi,
> >     > >
> >     > > I need a way to do a manual fail-back on demand.
> >     > > To be clear, I don't want it to be ON/OFF; I want it to be
> >     more like
> >     > > "one shot".
> >     > > So far I found that the most reasonable way to do it - is to set
> >     > > "resource stickiness" to a different value, and then set it
> >     back to
> >     > > what it was.
> >     > > To do that I created a simple script with two lines:
> >     > >
> >     > >     crm configure rsc_defaults resource-stickiness=50
> >     > >     crm configure rsc_defaults resource-stickiness=150
> >     > >
> >     > > There are no timeouts before setting the original value back.
> >     > > If I call this script, I get what I want - Pacemaker moves
> >     resources
> >     > > to their preferred locations, and "resource stickiness" is set
> >     back to
> >     > > its original value.
> >     > >
> >     > > Despite it works, I still have few concerns about this approach.
> >     > > Will I get the same behavior under a big load with delays on
> >     systems
> >     > > in cluster (which is truly possible and a normal case in my
> >     environment)?
> >     > > How Pacemaker treats fast change of this parameter?
> >     > > I am worried that if "resource stickiness" is set back to its
> >     original
> >     > > value to fast, then no fail-back will happen. Is it possible, or
> I
> >     > > shouldn't worry about it?
> >     >
> >     > AFAIK pengine is interrupted when calculating a more complicated
> >     transition
> >     > and if the situation has changed a transition that is just being
> >     executed
> >     > is aborted if the input from pengine changed.
> >     > So I would definitely worry!
> >     > What you could do is to issue 'crm_simulate -Ls' in between and
> >     grep for
> >     > an empty transition.
> >     > There might be more elegant ways but that should be safe.
> >
> >     crmsh has an option (-w) to wait for the PE to settle after
> >     committing configuration changes.
> >
> >     Thanks,
> >
> >     Dejan
> >     >
> >     > > Thank you,
> >     > > Kostia
> >     > >
> >     > >
> >     > > _______________________________________________
> >     > > Users mailing list: Users at clusterlabs.org
> >     <mailto:Users at clusterlabs.org>
> >     > > http://clusterlabs.org/mailman/listinfo/users
> >     <http://clusterlabs.org/mailman/listinfo/users>
> >     > >
> >     > > Project Home: http://www.clusterlabs.org
> >     > > Getting started:
> >     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >     > > Bugs: http://bugs.clusterlabs.org
> >     >
> >     >
> >     >
> >     > _______________________________________________
> >     > Users mailing list: Users at clusterlabs.org
> >     <mailto:Users at clusterlabs.org>
> >     > http://clusterlabs.org/mailman/listinfo/users
> >     <http://clusterlabs.org/mailman/listinfo/users>
> >     >
> >     > Project Home: http://www.clusterlabs.org
> >     > Getting started:
> >     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >     > Bugs: http://bugs.clusterlabs.org
> >
> >     _______________________________________________
> >     Users mailing list: Users at clusterlabs.org
> >     <mailto:Users at clusterlabs.org>
> >     http://clusterlabs.org/mailman/listinfo/users
> >     <http://clusterlabs.org/mailman/listinfo/users>
> >
> >     Project Home: http://www.clusterlabs.org
> >     Getting started:
> >     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> >     Bugs: http://bugs.clusterlabs.org
> >
> >
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20161109/57b568bf/attachment-0001.html>