[ClusterLabs Developers] Extend enumeration of OCF return values

Jan Pokorný jpokorny at redhat.com
Wed Oct 16 07:41:42 EDT 2019

On 16/10/19 09:18 +0000, Yan Gao wrote:
> On 10/15/19 4:31 PM, Ken Gaillot wrote:
>> On Tue, 2019-10-15 at 13:08 +0200, Tony den Haan wrote:
>>> Hi,
>>> I ran into getting "error 1" from portblock, so OCF_ERR_GENERIC,
>>> which for me doesn't guarantee the error was RC from portblock or
>>> pacemaker itself.
>>> Wouldn't it be quite useful to
>>> 1) give the agents a unique number to add to the OCF RC code, thus
>>> helping to determine origin of error
>>> 2) show an actual error string instead of "unknown error(1)". This is
>>> the last you want to see when a cluster is stuck.
>>> Tony
>> I agree it's an issue, but the exit codes have to stay fairly generic.
>> There are only 255 possible exit codes, and half of those most shells
>> use for signals. Meanwhile there are dozens of agents. More
>> importantly, Pacemaker needs standard meanings to know how to respond.
>> However there are possibilities:
>> - OCF could add a few more codes for common error conditions. (This
>> requires updating the standard, as well as software such as Pacemaker
>> to be aware of them.)
>> - OCF already supports an arbitrary string "exit reason" which
>> pacemaker will display beyond just "unknown". It's up to the individual
>> agents to support this, and all of them should. Agents can get as
>> specific as they like with exit reasons.
>> - Agents can also log to the system log, or print error output which
>> pacemaker will log in its detail log. Many already provide good
>> information this way, but there's always room for improvement.
> All make sense. A lot of times, I can feel it's the wording "unknown 
> error" that frustrates users since they are definitely not in a good 
> mood seeing any errors in their beloved clusters, not to mention ones 
> are even "unknown" ;-)
> As a manner of fact, it's probably the mostly returned error. I'd prefer 
> to call it something different from user interfaces, for example 
> "generic error" or just "error". Since:

\me votes for "sundry error" :-)

Seriously, better for getting the right hits of a random $WEBSEARCHER
since this is the first line of universal defense for a growing
population.  Assumes proper and web bots explorable documentation.

> - If "exit reason" gives a hint, it's not really "unknown".
> - Even if there's no "exit reason" given, it doesn't mean it's 
> "unknown". Usually clues could be found from logs.

Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/developers/attachments/20191016/3b3abd62/attachment.sig>

More information about the Developers mailing list