[ClusterLabs] Antw: Re: OCF_ERR_CONFIGURED (was: Virtual ip resource restarted on node with down network device)

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Oct 5 09:07:52 EDT 2016

>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 20.09.2016 um 16:43 in Nachricht
<51303130-9cc5-85d6-2f26-ff21b711ab82 at redhat.com>:
> On 09/20/2016 07:38 AM, Lars Ellenberg wrote:
>> From the point of view of the resource agent,
>> you configured it to use a non-existing network.
>> Which it considers to be a configuration error,
>> which is treated by pacemaker as
>> "don't try to restart anywhere
>> but let someone else configure it properly, first".
>> I think the OCF_ERR_CONFIGURED is good, though, otherwise 
>> configuration errors might go unnoticed for quite some time.
>> A network interface is not supposed to "vanish".
>> You may disagree with that choice,
> This is a point we should settle in the upcoming changes to the OCF
> standard.
> The OCF 1.0 standard
> (https://github.com/ClusterLabs/OCF-spec/blob/master/ra/resource-agent-api.md)
> merely says it means "Program is not configured". That is open to
> interpretation.
> Pacemaker
> (http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Expla 
> ined/index.html#s-ocf-return-codes)
> has a more narrow view: "The resource's configuration is invalid. E.g.
> required parameters are missing."

I agree that OCF_ERR_CONFIGURED signals a configuration error that retrying without operator intervention cannot fix. However it may be node specific.

> The reason Pacemaker considers it a fatal error is that it expects it to
> be returned only for an error in the resource agent's configuration *in
> the cluster*. If the cluster config is bad, it doesn't matter which node
> we try it on. For example, if an agent takes a parameter "frobble" with
> valid values from 1 to 10, and the user supplies "frobble=-1", that
> would be a configuration error.
> I think in OCF 2.0 we should distinguish "supplied RA parameters are
> bad" from "service's configuration on this host is bad". Currently,
> Pacemaker expects the latter error to generate OCF_ERR_GENERIC,
> OCF_ERR_ARGS, OCF_ERR_PERM, or OCF_ERR_INSTALLED, which allows it to try
> the resource on another node.

IMHO OCF_ERR_INSTALLED is similar to the above, but some software is missing or incompatible.

OCF_ERR_ARGS vs. OCF_ERR_CONFIGURED: Configured may be valid by syntax, but bad regarding the environment, whereas OCF_ERR_ARGS is invalid in all cases.
OCF_ERR_GENERIC is "catch the rest", I guess.

> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 

More information about the Users mailing list