[ClusterLabs Developers] Extend enumeration of OCF return values
kgaillot at redhat.com
Tue Oct 15 10:31:06 EDT 2019
On Tue, 2019-10-15 at 13:08 +0200, Tony den Haan wrote:
> I ran into getting "error 1" from portblock, so OCF_ERR_GENERIC,
> which for me doesn't guarantee the error was RC from portblock or
> pacemaker itself.
> Wouldn't it be quite useful to
> 1) give the agents a unique number to add to the OCF RC code, thus
> helping to determine origin of error
> 2) show an actual error string instead of "unknown error(1)". This is
> the last you want to see when a cluster is stuck.
I agree it's an issue, but the exit codes have to stay fairly generic.
There are only 255 possible exit codes, and half of those most shells
use for signals. Meanwhile there are dozens of agents. More
importantly, Pacemaker needs standard meanings to know how to respond.
However there are possibilities:
- OCF could add a few more codes for common error conditions. (This
requires updating the standard, as well as software such as Pacemaker
to be aware of them.)
- OCF already supports an arbitrary string "exit reason" which
pacemaker will display beyond just "unknown". It's up to the individual
agents to support this, and all of them should. Agents can get as
specific as they like with exit reasons.
- Agents can also log to the system log, or print error output which
pacemaker will log in its detail log. Many already provide good
information this way, but there's always room for improvement.
Ken Gaillot <kgaillot at redhat.com>
More information about the Developers