[ClusterLabs Developers] Extend enumeration of OCF return values
YGao at suse.com
Wed Oct 16 05:18:10 EDT 2019
On 10/15/19 4:31 PM, Ken Gaillot wrote:
> On Tue, 2019-10-15 at 13:08 +0200, Tony den Haan wrote:
>> I ran into getting "error 1" from portblock, so OCF_ERR_GENERIC,
>> which for me doesn't guarantee the error was RC from portblock or
>> pacemaker itself.
>> Wouldn't it be quite useful to
>> 1) give the agents a unique number to add to the OCF RC code, thus
>> helping to determine origin of error
>> 2) show an actual error string instead of "unknown error(1)". This is
>> the last you want to see when a cluster is stuck.
> I agree it's an issue, but the exit codes have to stay fairly generic.
> There are only 255 possible exit codes, and half of those most shells
> use for signals. Meanwhile there are dozens of agents. More
> importantly, Pacemaker needs standard meanings to know how to respond.
> However there are possibilities:
> - OCF could add a few more codes for common error conditions. (This
> requires updating the standard, as well as software such as Pacemaker
> to be aware of them.)
> - OCF already supports an arbitrary string "exit reason" which
> pacemaker will display beyond just "unknown". It's up to the individual
> agents to support this, and all of them should. Agents can get as
> specific as they like with exit reasons.
> - Agents can also log to the system log, or print error output which
> pacemaker will log in its detail log. Many already provide good
> information this way, but there's always room for improvement.
All make sense. A lot of times, I can feel it's the wording "unknown
error" that frustrates users since they are definitely not in a good
mood seeing any errors in their beloved clusters, not to mention ones
are even "unknown" ;-)
As a manner of fact, it's probably the mostly returned error. I'd prefer
to call it something different from user interfaces, for example
"generic error" or just "error". Since:
- If "exit reason" gives a hint, it's not really "unknown".
- Even if there's no "exit reason" given, it doesn't mean it's
"unknown". Usually clues could be found from logs.
More information about the Developers