[ClusterLabs Developers] CRM trying to demote a stopped resource

Andrei Borzenkov arvidjaar at gmail.com
Wed Aug 5 13:37:39 UTC 2015


On Wed, Aug 5, 2015 at 4:04 PM, Jehan-Guillaume de Rorthais
<jgdr at dalibo.com> wrote:
> hi guys,
>
> We are still on our new postgresql resource agent.
>
> We kind of make our minds with the promotion issue (see ml thread "problem with
> master score limited to 1000000") and found an acceptable algorithm.
>
> Now we are testing this RA, I found a strange behavior of the CRM with a simple
> failure scenario: The master resource is stopped.
>
> When I stop gracefully the master,

You mean - stop postgres outside of pacemaker?

>                                                   the CRM tries to recover the resource
> with :
>
> * demote it
> * stop it
> * start it
> * promote it
>
> Sounds logic, but it fails at the first step because the master is actually
> stopped. According to the "ra-dev-guide", the RA should returns OCF_ERR_GENERIC
> if the resource is stopped on demote. See:
>
>   http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html
>
> When teaching my RA to follow this, the CRM keep trying the same transition
> again and again until the failcount reaches the migration-threshold. Then it
> stops trying to recover it and moves the resource to another node.
>
> Same result if the RA returns OCF_NOT_RUNNING from the demote action instead of
> OCF_ERR_GENERIC.
>
> I could try to obey the CRM and start the resource as a slave and
> return OCF_SUCCESS, but it sounds ridiculous as it will be stopped at the
> really next step, then start again one step later...
>
> Did I missed something? Is this behavior normal? Any advise to fix this?
>
> Regards,
> --
> Jehan-Guillaume de Rorthais
> Dalibo
> http://www.dalibo.com
>
> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers




More information about the Developers mailing list