[ClusterLabs Developers] Opinions wanted: OCF agent types

Wed Aug 17 20:00:51 UTC 2022

Hi all,

OCF 1.1 hasn't been out that long but I'm already looking ahead to OCF
1.2 (which would remain backward-compatible).

One big addition I'm contemplating is defining OCF resource agent
types, to address these problems:

* Fence agents have a completely different standard from OCF resource
agents, and lack some of the features available to OCF agents (such as
meaningful error statuses and exit reasons for failures).

* Pacemaker's node health feature uses OCF agents to monitor node
conditions, but there are some user pain points involved since they are
indistinguishable from regular OCF agents.

* In the past there has been discussion of implementing "storage
agents" to help manage replication of external storage devices,
primarily for disaster recovery purposes.

Visually, the agent type would be another field in
the agent specification, for example ocf:fence:heartbeat:iscsi or
ocf:health:pacemaker:cpu.

"Regular" OCF agents would be (for example)
ocf:service:heartbeat:apache in full, but for backward compatibility
"service" would be the default, and ocf:heartbeat:apache would continue
to work.

Alternatively, if we want to keep it to three fields, we could do
something like ocf-fence:heartbeat:iscsi and ocf-health:pacemaker:cpu.

The OCF standard would have a shared section that all agent types would
be required to support. This could include things like exit status
codes, environment variables, and the meta-data action. Each agent type
would then have its own section with anything specific to that type --
for example, service agents need to support start and stop actions,
while fence agents need to support off and optionally reboot.

The benefits would include:

* Agent writers would have fewer differences to worry about and
libraries to learn.

* Pacemaker and higher-level tools could easily distinguish agent types
and respond intelligently. For example, higher-level shells could list
all health agents and clone them automatically when used, and Pacemaker
could automatically exempt health agents from health restrictions so
that the agent can automatically detect when the node becomes healthy
again.

* We would have a framework for adding new types if the need arises.

Thoughts?
-- 
Ken Gaillot <kgaillot at redhat.com>