[ClusterLabs Developers] Why returning OCF_ERR_GENERIC during demote if resource stopped?
Andrew Beekhof
andrew at beekhof.net
Mon May 16 03:15:11 UTC 2016
> On 28 Apr 2016, at 7:26 PM, Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:
>
> Hello all,
>
> According to the developers guide, when calling demote on a stopped resources,
> the RA should returns a soft error:
>
> http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html
>
> «
> foobar_monitor
> rc=$?
> case "$rc" in
> [...]
> "$OCF_NOT_RUNNING")
> # Currently not running. Getting a demote action
> # in this state is unexpected. Exit with an error
> # and let the cluster manager recover.
> ocf_log err "Resource is currently not running"
> exit $OCF_ERR_GENERIC
> ;;
> [...]
> »
>
> But to recover a master resource that is fount not running, PEngine produce a
> transition with the following actions: demote -> stop -> start -> promote.
>
> If we follow the dev guide, the recover action is not possible on a
> stopped master as the first action of the transition will always fail, leading
> to a migration and a -inf score on the old master node.
>
> My first though was «why doing a demote -> stop that breaks everything when it
> knows the resource is already stopped?!»
>
> If I understand correctly, I guess PEngine **must** produce such a transition
> so the notify actions are triggered should other leaving clone need to process
> them. Is it right?
Yes, also because in theory there could be some cleanup that needs to happen.
> If this is right, then maybe we should relax a bit what is
> written in the ocf dev guide?
I would change that block use to
exit $OCF_NOT_RUNNING
Because we don’t know for sure that the stop will happen
>
> To be able to deal with this in our RA, if the resource is stopped during the
> demote action, we silently start it as a slave and return OCF_ERR_GENERIC If we
> couldn't start the resource. We return OCF_SUCCESS if it succeed (I guess we
> could juste return OCF_SUCCESS without starting it if the transition plans to
> stop it according to the notify variables).
>
> Comments? Advices?
>
> Regards,
> --
> Jehan-Guillaume de Rorthais
> Dalibo
>
> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers
More information about the Developers
mailing list