[ClusterLabs] OCF_ERR_CONFIGURED (was: Virtual ip resource restarted on node with down network device)

Lars Ellenberg lars.ellenberg at linbit.com
Tue Sep 20 15:44:01 UTC 2016


On Tue, Sep 20, 2016 at 09:43:23AM -0500, Ken Gaillot wrote:
> On 09/20/2016 07:38 AM, Lars Ellenberg wrote:
> > From the point of view of the resource agent,
> > you configured it to use a non-existing network.
> > Which it considers to be a configuration error,
> > which is treated by pacemaker as
> > "don't try to restart anywhere
> > but let someone else configure it properly, first".
> > 
> > I think the OCF_ERR_CONFIGURED is good, though, otherwise 
> > configuration errors might go unnoticed for quite some time.
> > A network interface is not supposed to "vanish".
> > 
> > You may disagree with that choice,
> 
> This is a point we should settle in the upcoming changes to the OCF
> standard.

I meant "that choice of this RA",
namely to return this error code in this situation:
interface specified in cluster configuration does not exist.

I find OCF_ERR_CONFIGURED appropriate.  One could argue that
OCF_ERR_INSTALLED or OCF_ERR_GENERIC would be more appropriate.
All with current pacemaker semantics, which you referenced below.

> The OCF 1.0 standard
> (https://github.com/ClusterLabs/OCF-spec/blob/master/ra/resource-agent-api.md)
> merely says it means "Program is not configured". That is open to
> interpretation.
> 
> Pacemaker
> (http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-ocf-return-codes)
> has a more narrow view: "The resource's configuration is invalid. E.g.
> required parameters are missing."
> 
> The reason Pacemaker considers it a fatal error is that it expects it to
> be returned only for an error in the resource agent's configuration *in
> the cluster*. If the cluster config is bad, it doesn't matter which node
> we try it on. For example, if an agent takes a parameter "frobble" with
> valid values from 1 to 10, and the user supplies "frobble=-1", that
> would be a configuration error.
> 
> I think in OCF 2.0 we should distinguish "supplied RA parameters are
> bad" from "service's configuration on this host is bad". Currently,
> Pacemaker expects the latter error to generate OCF_ERR_GENERIC,
> OCF_ERR_ARGS, OCF_ERR_PERM, or OCF_ERR_INSTALLED, which allows it to try
> the resource on another node.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT




More information about the Users mailing list