[ClusterLabs Developers] [deveoplers] maintenance vs is-managed; different levels of the maintenance property

Ken Gaillot kgaillot at redhat.com
Wed Nov 27 19:19:15 EST 2019


On Wed, 2019-11-27 at 12:13 +0000, Yan Gao wrote:
> Hi all,
> 
> First thanks for bringing this up, Aleksei.
> 
> On 11/26/19 3:38 PM, Aleksei Burlakov wrote:
> > Dear Developers,
> > 
> > I would like to raise a discussion about an issue/feature about
> > the 
> > maintenance property applied to different levels of a cluster.
> > 
> > In order to explain the problem lets consider several examples.
> > 
> > *Scenario 1*. There is a primitive p1 in a group g1. When applying
> > the 
> > maintenance property to them, the most specific resource will take 
> > precedence. Namely, if p1 has the maintenance property set, it will
> > be 
> > used, otherwise, the maintenance property of the g1 is used. It's
> > ok.
> > 
> > *Scenario 1*. There is a clone of a group c1, group g1, and a
> > primitive 
> > p1, that belongs to the group g1. When we apply the maintenance 
> > attribute, the most specific attribute takes precedence, but the 
> > attribute of the clone c1 may have different value than the group
> > and 
> > the primitive. It's strange, but still ok.
> > 
> > *Scenario 3*. There is a cluster with one node node1 and one
> > primitive 
> > p1. This time when setting the maintenance/maintenance-mode
> > attribute to 
> > the primitive/node1/default-property the p1 will be true when
> > either one 
> > of them is set to true. It means the most specific attribute don't 
> > precede anymore. One may think it's the feature, but imho its a
> > *bug*.

One source of confusion is that maintenance mode can be set cluster-
wide via the "maintenance-mode" property, for a particular node via the
"maintenance" node attribute, or for a particular resource via the
"maintenance" resource meta-attribute.

Separately, the resource meta-attribute may be set in rsc_defaults, the
primitive, or any collective resource (clone or group) enclosing it.

How the different options combine is separate from how resource meta-
attributes at different levels combine.

When comparing meta-attributes at different levels, it is true that the
most specific attribute wins. The primitive wins over the collective
resource, which wins over the rsc_defaults.

When comparing the different options, it's a different story. There
really isn't such a thing as "more specific" -- a node is not more or
less specific than a resource (consider a clone running on all nodes,
which is more specific?). So in that case, the resource is put in
maintenance if either the resource or the node is in maintenance mode.

The cluster-wide property forces everything into maintenance. Note that
it's different from setting the resource meta-attribute to true in
rsc_defaults, which can be overridden by a more specific setting in the
resource itself.

I agree it's confusing but I'm not sure a different approach would be
any more obvious.

> > *Scenario 4*. There is only a primitive p1. We apply  both: an 
> > is-managed attribute and maintenance attribute This is it will work
> > as
> >          is-managed        maintenance        |               p1
> >               false                      false               
> > |        
> >   unmanaged
> 
> Since it seems that "maintenance" attribute tends to takes
> precedence 
> over "is-managed" attribute, this combination probably is the only 
> problematic case. I think it probably needs an "else" in here:
> 
> https://github.com/ClusterLabs/pacemaker/blob/master/lib/pengine/complex.c#L531
> 
> :-)
> 
> >               false                      true               
> >  |        
> >   unmanaged
> >               true                       false               
> > |        
> >   managed
> >               true                       true               
> >  |        
> >   unmanaged
> > that works quite unexpectedly.
> > 
> > In the more complex scenarios, where the is-managed property is
> > used 
> > together with the maintenance, the status resources get
> > unpredictable, 
> > which is definitely *not ok*.

I think it's inherently confusing that both is-managed and maintenance
exist. For resources they function identically except monitors will
continue to run if a resource is unmanaged but not if it's in
maintenance mode.

It's especially confusing since is-managed is positive (true is the
default situation) whereas maintenance is negative (false is the
default). So their senses are flipped (is-managed=false is comparable
to maintenance=true).

It might have been a clearer design if there was one option for
maintenance mode (no is-managed), and a second option for whether
monitors continue to run in maintenance mode. Or, resource maintenance
could take three values (for off, on-with-monitor, and on-without-
monitor).

The table of results you got makes sense if you consider that only if
both options are in their default state (is-managed=true and
maintenance=false) is the resource in the default (managed) state.
Flipping either option is an intention to unmanage the resource.

> > The *counterargument* is that it was always like this and the
> > customers 
> > are used to this behavior. They may remember that a certain
> > combination 
> > of attributes leads to a certain status and enforcing the most
> > specific 
> > rule would lead to the *change of behaviour and backward
> > incompatibility*.
> 
> Despite how the conflicting attributes/settings should be handled 
> correctly at resource levels, I think one main question here probably
> is:
> 
> Should maintenance mode of node/cluster scope consider any specific 
> conflicting settings of resources? Or should it just put the whole
> scope 
> (node/cluster) into maintenance mode without exceptions, as how it is
> now?

It's a good question. I don't think either answer is obviously
intuitive, so it boils down to documenting it. Which we should do :)

There is some room for coming up with better option naming and meaning.
For example maybe the cluster-wide "maintenance-mode" should be
something like "force-maintenance" to make clear it takes precedence
over node and resource maintenance.

> 
> Regards,
>    Yan
> 
> > 
> > Please share your opinion about the issue, if we should leave it
> > working 
> > as is or enforce the most specific rule in though the whole
> > cluster. And 
> > give a priority to either one of the conflicting attributes (is-
> > managed 
> > vs maintenance).
> > 
> > Best regards,
> > Aleksei Burlakov
> > SUSE Senior Developer
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Developers mailing list