[ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

Ken Gaillot kgaillot at redhat.com
Thu Mar 2 18:31:33 EST 2017


On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
>> When I recently tried to make use of the DEGRADED monitoring results,
>> I found out that it does still not work.
>>
>> Because LRMD choses to filter them in ocf2uniform_rc(),
>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>
>> See patch suggestion below.
>>
>> It also filters away the other "special" rc values.
>> Do we really not want to see them in crmd/pengine?
> 
> I would think we do.
> 
>> Why does LRMD think it needs to outsmart the pengine?
> 
> Because the person that implemented the feature incorrectly assumed
> the rc would be passed back unmolested.
> 
>>
>> Note: I did build it, but did not use this yet,
>> so I have no idea if the rest of the implementation of the DEGRADED
>> stuff works as intended or if there are other things missing as well.
> 
> failcount might be the other place that needs some massaging.
> specifically, not incrementing it when a degraded rc comes through

I think that's already taken care of.

>> Thougts?\
> 
> looks good to me
> 
>>
>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>> index 724edb7..39a7dd1 100644
>> --- a/lrmd/lrmd.c
>> +++ b/lrmd/lrmd.c
>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char *stdout_data)
>>  static int
>>  ocf2uniform_rc(int rc)
>>  {
>> -    if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>> -        return PCMK_OCF_UNKNOWN_ERROR;

Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
be the high end.

Lars, do you want to test that?

>> +    switch (rc) {
>> +    default:
>> +           return PCMK_OCF_UNKNOWN_ERROR;
>> +
>> +    case PCMK_OCF_OK:
>> +    case PCMK_OCF_UNKNOWN_ERROR:
>> +    case PCMK_OCF_INVALID_PARAM:
>> +    case PCMK_OCF_UNIMPLEMENT_FEATURE:
>> +    case PCMK_OCF_INSUFFICIENT_PRIV:
>> +    case PCMK_OCF_NOT_INSTALLED:
>> +    case PCMK_OCF_NOT_CONFIGURED:
>> +    case PCMK_OCF_NOT_RUNNING:
>> +    case PCMK_OCF_RUNNING_MASTER:
>> +    case PCMK_OCF_FAILED_MASTER:
>> +
>> +    case PCMK_OCF_DEGRADED:
>> +    case PCMK_OCF_DEGRADED_MASTER:
>> +           return rc;
>> +
>> +#if 0
>> +           /* What about these?? */
> 
> yes, these should get passed back as-is too
> 
>> +    /* 150-199 reserved for application use */
>> +    PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by disconnection of the LRM API to a local or remote node */
>> +
>> +    PCMK_OCF_EXEC_ERROR    = 192, /* Generic problem invoking the agent */
>> +    PCMK_OCF_UNKNOWN       = 193, /* State of the service is unknown - used for recording in-flight operations */
>> +    PCMK_OCF_SIGNAL        = 194,
>> +    PCMK_OCF_NOT_SUPPORTED = 195,
>> +    PCMK_OCF_PENDING       = 196,
>> +    PCMK_OCF_CANCELLED     = 197,
>> +    PCMK_OCF_TIMEOUT       = 198,
>> +    PCMK_OCF_OTHER_ERROR   = 199, /* Keep the same codes as PCMK_LSB */
>> +#endif
>>      }
>> -
>> -    return rc;
>>  }
>>
>>  static int




More information about the Users mailing list