[ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

Andrew Beekhof abeekhof at redhat.com
Wed Mar 1 18:28:17 EST 2017


On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
<lars.ellenberg at linbit.com> wrote:
> When I recently tried to make use of the DEGRADED monitoring results,
> I found out that it does still not work.
>
> Because LRMD choses to filter them in ocf2uniform_rc(),
> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>
> See patch suggestion below.
>
> It also filters away the other "special" rc values.
> Do we really not want to see them in crmd/pengine?

I would think we do.

> Why does LRMD think it needs to outsmart the pengine?

Because the person that implemented the feature incorrectly assumed
the rc would be passed back unmolested.

>
> Note: I did build it, but did not use this yet,
> so I have no idea if the rest of the implementation of the DEGRADED
> stuff works as intended or if there are other things missing as well.

failcount might be the other place that needs some massaging.
specifically, not incrementing it when a degraded rc comes through

>
> Thougts?\

looks good to me

>
> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
> index 724edb7..39a7dd1 100644
> --- a/lrmd/lrmd.c
> +++ b/lrmd/lrmd.c
> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char *stdout_data)
>  static int
>  ocf2uniform_rc(int rc)
>  {
> -    if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
> -        return PCMK_OCF_UNKNOWN_ERROR;
> +    switch (rc) {
> +    default:
> +           return PCMK_OCF_UNKNOWN_ERROR;
> +
> +    case PCMK_OCF_OK:
> +    case PCMK_OCF_UNKNOWN_ERROR:
> +    case PCMK_OCF_INVALID_PARAM:
> +    case PCMK_OCF_UNIMPLEMENT_FEATURE:
> +    case PCMK_OCF_INSUFFICIENT_PRIV:
> +    case PCMK_OCF_NOT_INSTALLED:
> +    case PCMK_OCF_NOT_CONFIGURED:
> +    case PCMK_OCF_NOT_RUNNING:
> +    case PCMK_OCF_RUNNING_MASTER:
> +    case PCMK_OCF_FAILED_MASTER:
> +
> +    case PCMK_OCF_DEGRADED:
> +    case PCMK_OCF_DEGRADED_MASTER:
> +           return rc;
> +
> +#if 0
> +           /* What about these?? */

yes, these should get passed back as-is too

> +    /* 150-199 reserved for application use */
> +    PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by disconnection of the LRM API to a local or remote node */
> +
> +    PCMK_OCF_EXEC_ERROR    = 192, /* Generic problem invoking the agent */
> +    PCMK_OCF_UNKNOWN       = 193, /* State of the service is unknown - used for recording in-flight operations */
> +    PCMK_OCF_SIGNAL        = 194,
> +    PCMK_OCF_NOT_SUPPORTED = 195,
> +    PCMK_OCF_PENDING       = 196,
> +    PCMK_OCF_CANCELLED     = 197,
> +    PCMK_OCF_TIMEOUT       = 198,
> +    PCMK_OCF_OTHER_ERROR   = 199, /* Keep the same codes as PCMK_LSB */
> +#endif
>      }
> -
> -    return rc;
>  }
>
>  static int
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Users mailing list