[ClusterLabs] sequencing multiple CIB changes (WAs: How Pacemaker reacts to fast changes of the same parameter in configuration)

Thu Nov 10 05:19:47 EST 2016

>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 10.11.2016 um 10:17 in
Nachricht <5c8e5bb6-67e5-8685-9b9c-b534d512b72e at redhat.com>:
> On 11/10/2016 08:27 AM, Ulrich Windl wrote:
>>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 09.11.2016 um 17:42 in
>> Nachricht <80c65564-b299-e504-4c6c-afd0ff86e178 at redhat.com>:
>>> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote:
>>>> When one problem seems to be solved, another one appears.
>>>> Now my script looks this way:
>>>>
>>>>     crm --wait configure rsc_defaults resource-stickiness=50
>>>>     crm configure rsc_defaults resource-stickiness=150
>>>>
>>>> While now I am sure that transactions caused by the first command
>>>> won't be aborted, I see another possible problem here.
>>>> With a minimum load in the cluster it took 22 sec for this script to
>>>> finish. 
>>>> I see here a weakness. 
>>>> If a node on which this script is called goes down for any reasons,
>>>> then "resource-stickiness" is not set back to its original value,
>>>> which is vary bad.
>> I don't quite understand: You want your resources to move to their preferred 
> location after some problem. When the node goes down with the lower 
> stickiness, there is no problem while the other node is down; when it comes 
> up, resources might be moved, but isn't that what you wanted?
> 
> I guess this is about the general problem with features like e.g. 'move'
> as well
> that are so much against how pacemaker is working.
> They are implemented inside the high-level-tooling.
> They are temporarily modifying the CIB and if something happens that makes
> this controlling high-level-tool go away it stays as is - or the CIB
> even stays
> modified and the user has to know that he has to do a manual cleanup.

I can imagine that the following change could help:
* Add a new "agenda" section to the CIB that will contain things to be done (call the steps or transactions)
* When there is an agenda, crmd will execute the individual steps in it in sequence, and will mark them as completed when the state is in the configuration section of the CIB. Maybe it has to interact with the policy engine. However either the agenda steps have to be idempotent (to be able to be applied multiple times in case the node fails during execution, or we need mark the two events "CIB changed" and "changes executed" for each agenda step.

The question is how a transaction in the agenda will look like. Preferrably it would look like some diff, or maybe more XML-sophisticated some XPath thingy to describe where to patch, and some data what to patch. XML gurus will know whether definitions for such things already exist; I don't.

Another question is what happens if one agenda step cannot be completed: Should the next step be executed anyway, or should the agenda be pauses, or should the rest of the agenda be cleared?

I could even imagine some exception mechanism that allows to define an agenda step to execute when any previous step could not complete. As always, there is the problem that the exception handler also fails to complete...

> So we could actually derive a general discussion from that how to handle
> these issues in a way that it is less likely to have artefacts persist after
> some administrative action.
> At the moment e.g. special tagging for the constraints that are
> automatically
> created to trigger a move  is one approach.
> But when would you issue an automatized cleanup? Is there anything
> implemented in high-level-tooling? pcsd I guess would be a candidate, for
> crmsh I don't know of a persistent instance that could take care of that ...
> 
> If we say we won't implement these features in the core of pacemaker
> I definitely agree. But is there anything we could do to make it easier
> for high-level-tools?

With the agenda: yes

Even a resource restart would work more reliable if the node executing the restart is fenced during stop.

> I'm thinking of some mechanism that makes the constraints somehow
> magically disappear or disabled when they have achieved what they
> were intended to, if the connection to some administrative-shell is
> lost, or ...
> I could imagine dependency on some token given to a shell, something
> like a suicide-timeout, ...
> Maybe the usual habit when configuring a switch/router can trigger
> some ideas: issue a reboot in x minutes; do a non persistent config-change;
> check if everything is fine afterwards; make it persistent; disable
> the timed reboot
>  
>>
[...deleted rest...]

Ulrich