[ClusterLabs] [EXT] Re: Feedback wanted: OCF Resource Agent API 1.1 proposed for adoption

Mon Mar 29 13:08:39 EDT 2021

I've made a note of these as ideas for 1.2/2.0 :)

On Sun, 2021-03-28 at 03:03 +0200, Ulrich Windl wrote:
> On 3/26/21 11:17 PM, Ken Gaillot wrote:
> > OCF 1.1 is now formally adopted!
> > 
> > https://github.com/ClusterLabs/OCF-spec/blob/master/ra/1.1/resource-agent-api.md
> > 
> > Thanks to everyone who gave feedback.
> 
> "The minor number can be used by both sides to see whether a certain 
> additional feature is supported by the other party."
> 
> That would mean there's a precise revision history with all features 
> changed. I doubt such a thing exists yet, and the mistakes made in
> the 
> past (like chaging the XML without changing the version) can't be 
> corrected either. ;-)

Very true :) but we're in a better position now than before.

All versions of the standard will be kept in the repo (ra/1.0/,
ra/1.1/, etc.) so the info is all there for comparison.

> "Actions must be idempotent." Well: if a "start" action fails, does
> it 
> have to fail the next time, too? Maybe it's "Successful Actions must
> be 
> idempotent."
> Maybe even "Successful state-changing Actions must be idempotent."
> ("Monitor" most likely isn't idempotent; otherwise you would get the 
> same status all the time, right?)

Maybe we should avoid "idempotent" and describe the desired situation
in more natural language.

> "Multiple resource instances of the same type may be running in
> parallel."
> 
> What about "Multiple concurrent actions for separate resource
> instances 
> (using the same RA) must be handled correctly." instead?

Or in addition, sure

> What about listing allowable exit codes with each action?

I think for the purposes of the standard, any action can return any of
the specified error codes that make sense in the agent's specific
context (of course the error codes should be used as the standard
describes)

> Are there any metadata provisions for reporting the OCF_CHECK_LEVEL?

Yes, "depth" (which is described in the standard, but maybe could be
clearer)

> I don't quite understand exit codes 190 and 191. Maybe add an
> example.

Sure, that makes sense.

The degraded codes are intended for host-specific conditions where the
service is currently fine but there is some indication that the host
may be less desirable as a location in the near future. Maybe some
required system resource is nearing exhaustion. An agent for a network
interface might report degraded if transmission errors are increasing.
That sort of thing.

For Pacemaker, these will be displayed in status output like failures,
but will not otherwise be treated as a resource failure.

I'm not aware of any agents currently using the feature. Support was
added to Pacemaker in 2015 (though broken until recently); I'm not sure
who originally requested it. 

> "must at least support XML output": Is there any format other than
> XML 
> specified? If not the statement doesn't make sense.

The standard allows agents to support any output formats they choose,
but meta-data must support XML. As of the new standard, an agent can
choose to support other formats if specified by OCF_OUTPUT_FORMAT, for
example for "text" they could display it in a human-readable format.
Only the XML format is constrained by the standard, agents can do what
they want with anything else.

> What about line-wrapping and other formatting usable in <longdesc>?
> What about lengths for <shortdesc>?
> 
> The Semantics are under-specified IMHO. Example <desc> vs.
> <shortdesc>?

The schema allows either <desc>, or <longdesc> and <shortdesc>,
depending on the context -- never mixed.

> IMHO it would be best to specify exactly what is allowed; everything 
> that isn't allowed is forbidden.
> (That's better tan allowing some things and forbidding others,
> leaving a 
> "gray zone" in between)
> 
> Regards,
> Ulrich

Definitely lots of details are left unspecified in the standard. It was
decided not to try to be exhaustive with 1.1 for the sake of getting it
out more quickly, since it's been 19 years since 1.0 already :)
-- 
Ken Gaillot <kgaillot at redhat.com>