[ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR
Ken Gaillot
kgaillot at redhat.com
Mon Mar 6 13:35:18 EST 2017
On 03/06/2017 10:55 AM, Lars Ellenberg wrote:
> On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote:
>> On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
>>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>>> <lars.ellenberg at linbit.com> wrote:
>>>> When I recently tried to make use of the DEGRADED monitoring results,
>>>> I found out that it does still not work.
>>>>
>>>> Because LRMD choses to filter them in ocf2uniform_rc(),
>>>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>>>
>>>> See patch suggestion below.
>>>>
>>>> It also filters away the other "special" rc values.
>>>> Do we really not want to see them in crmd/pengine?
>>>
>>> I would think we do.
>
>>>> Note: I did build it, but did not use this yet,
>>>> so I have no idea if the rest of the implementation of the DEGRADED
>>>> stuff works as intended or if there are other things missing as well.
>>>
>>> failcount might be the other place that needs some massaging.
>>> specifically, not incrementing it when a degraded rc comes through
>>
>> I think that's already taken care of.
>>
>>>> Thougts?\
>>>
>>> looks good to me
>>>
>>>>
>>>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>>>> index 724edb7..39a7dd1 100644
>>>> --- a/lrmd/lrmd.c
>>>> +++ b/lrmd/lrmd.c
>>>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char *stdout_data)
>>>> static int
>>>> ocf2uniform_rc(int rc)
>>>> {
>>>> - if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>>>> - return PCMK_OCF_UNKNOWN_ERROR;
>>
>> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
>> be the high end.
>>
>> Lars, do you want to test that?
>
> Why would we want to filter at all, then?
>
> I get it that we may want to map non-ocf agent exit codes
> into the "ocf" range,
> but why mask exit codes from "ocf" agents at all (in lrmd)?
>
> Lars
It's probably unnecessarily paranoid, but I guess the idea is to check
that the agent at least returns something in the expected range for OCF
(perhaps it's not complying with the spec, or complying with a newer
version of the spec than we can handle).
More information about the Users
mailing list